Test Your Backups


Some years ago, I was onsite at a large council-run recreation centre performing a scheduled upgrade of their membership management system. It had been a while since I'd paid a personal visit to them, so I'd scheduled to spend a bit of time sitting with the club manager to go over any other technology questions or issues he had after the upgrade was finished.

As per my standard operating procedure, I performed an adhoc backup of their member data which I dumped into a temporary folder on their Windows Desktop before doing anything then confidently kicked off the upgrade procedure. About half way through, the screen flashed, then came up with the infamous Windows Blue-Screen-of-Death. My pulse went up a bit because the BSOD happened at a point in the upgrade where the database was being restructured, so I was concerned about the integrity of the data.

Once the computer had rebooted- which took a while because something phunky was going on with the hard drive- I went straight to the database to check on its health and sure enough, it was hosed.

My pulse went up a little bit more.

Not to worry, I thought. I made that adhoc backup, so I'll just copy it back over and start again. Except the adhoc backup was also hosed! The hard drive was failing and I'd made the backup onto this same hard drive.

Now my pulse was running like I was teaching a class, except I felt cold chills inside and could feel the adrenalin start to flow.

OK, all is not lost I told myself. This is a large, well-funded government-sponsored recreation center. They'll have processes in place for regular backups, so I'll just restore last night's backups then go grovel for forgiveness from the manager for the fact he has now lost half a day's worth of data. I mean, how much can that be? A few member sign-ins? Maybe some drink sales? A membership sale?

I walked into the managers office and asked if I could have their most recent backups. He obliged me with an impressive array of CD's, each meticulously labelled with the date and time of the backup performed. There was roughly three CD's for each week, going back around three months. I was pleasantly surprised at how diligent they were in their backups and I remember wishing more clubs were like that.

Until I tried restoring the previous night's backup.

The backup on the CD was in an incomplete state and so I couldn't restore the data. Damn. OK, I'll use the backup before that. Same thing. The CD wasn't readable. Getting quite anxious, I tried yet more CD's, each time going back further and further in time. It turns out, I had to go back as far as two months before I found a usable backup! That's two whole months of membership sales, renewals, POS transactions, sign-ins and enquiry recording all lost. Gone!

I won't go into how I reacted to this (or how the manager himself reacted), but let's just say no-one came away a winner that day.

The moral of this story?

The point of my anecdote is this. This club had been doing what they thought was the right thing. They were religiously doing backups of their data, documenting everything they were doing, storing the backups in a safe, dry place, keeping historical copies, the works. They were doing everything by the book and exactly as a dream customer would... EXCEPT checking the integrity of their backups.

Their one failing in an otherwise admirable policy was that they weren't periodically pulling out their backups and checking to make sure that it's working. So when the fateful time came when the backup was truly needed, they (and me) were let down as the backups proved flawed and so were as useless as desk fan in a Spinning room.

So my very strong advice is simple. Make it your policy to periodically pull out two or three of your most recent backups and check them. Make sure the data on them is readable. Make sure the backup dataset can be properly restored. Make sure the physical media the backup is stored on is in good condition and working.

I recommend you do this a minimum of once per month. It is a relatively quick and simple process. If your backups are performed automatically using automated backup software, such software almost always have built-in functions to test backup integrity. If your backups are being performed manually by you copying data onto a USB keyring using Windows Explorer (which is perfectly acceptable- I don't care, as long as you're backing up!), then try copying the backup data back onto your computer to make sure it's readable.
And if possible (and if you know what you're looking for), have a quick poke around the backup dataset to make sure that all of your data is actually being backed up and you're not missing anything that you thought was being included but isn't.


Bonus lesson: one of the mistakes I made that day (apart from not calling in sick) was that I did my adhoc backup onto the same hard drive as the data. Although this has the benefit of being quick and simple, it suffers from a significant flaw- if the hard drive fails, then you've lost both your data and your backup. So never back up onto the same drive as your data.

If you make backup integrity checks a part of your standard operating procedures, then you'll go a long way towards helping prevent some poor IT geek like me from having a myocardial infarction when he inadvertently trashes your data then tries to undo his mistake by restoring your backup.

Have fun!
Mike.Ryan


You can follow any responses to this entry through the RSS feed. You can leave a response, or trackback from your own site.

One Comment

  1. Di says:

    I saw a similar thing happen in a large corporation once. When a data recovery was attempted after a server failure, it was discovered that the backup system hadn’t written anything to the tapes for more than a month. The IT department copped a big shake-up over that!
    So another moral is-it’s easy to overlook the obvious.

    Reply

Leave a Reply