March 30, 2008

Backups

It doesn’t matter what I say here, because nobody starts making regular backups until after they lose something important.

It’s like the dirty little secret of computer professionals, everyone agrees that backups are a good idea, but nobody actually does them until it’s too late. The few people who make regular backups rarely, if ever, test them–which means they may be in for a rude awaking when a drive does die. And drives do die–I’ve seen my own ones go bad, and once you get into a larger installation, you see them die every day. They say that a hard drive only exists in two states: failed and about to fail, and this is probably the best way to think about them.

I take my backups seriously, mostly because I take my data seriously. Note that by ‘data’ here I don’t mean every file on my computer, applications, anything downloaded from the ‘net, etc. doesn’t count because it’s all replaceable. Things like written documents, code, and photographs are not so replaceable and should be backed up.

For me, a good backup system needs to be the following things:

  • Automatic and hands-off — if you have to do it, you won’t (generally). This is something that should be as automated as possible–down to a single script, or fully automatic if possible.
  • Not locked in to a particular vendor — you should be able to recover your files anywhere, on any computer.
  • Reliable — if it’s automatic, you shouldn’t have to check on it to make sure it’s still working. The media that the backup is stored on should also be something you can rely on (ie, not a CD/DVD that will be unreadable in a few years, or a single hard drive that will fail on you)
  • Easy to recover from — it doesn’t matter if it’s a single file or all your files, it should be really easy to go back and get those files.

So, how do I do my backups? Well, every one of my photos is on (at least) two hard drives and on two DVDs (one at home, the other at work… in case of fire, theft, etc.). Every photo is copied to my laptop from the camera and then is copied to my server onto a set of mirrored drives. Once I get to around 4GB or so (or enough time has passed that I’m uncomfortable without the off-site backup) I burn it to two DVDs, put one in the case here and take the other to work. Then eventually the copy on the laptop will get rotated off to make room for more. So at the weakest point the files are on three hard drives (one in the laptop and the mirrored set).

My other files have a slightly different backup scheme. The drives in my desktop and server are all rsync’d to a different pair of mirrored drives attached to a Linksys NSLU2 running Linux. This is completely automated and happens every night while I sleep, there’s no human intervention required at all. This one I’ve had to fall back on once, not because of a drive failure but because of human error. The benefit to the rsync backup scheme is that it’s done incrementally, so I have snapshot of the drives as they existed for the last week. Because of the way the backup is done, this only takes up the space of the most recent backup, but the differences to the previous ones–this is far more economical than having 7x your main storage as backup space.

As I said when this started, none of this really matters because everyone thinks they’re invulnerable to data loss until it happens to them–but perhaps this will inspire somebody to start anyways. And if you don’t start making good backups, I’ll be here to say ‘told you so’ when your drives die on you.

Leave a Reply