I like to listen to music when I’m here alone. Right now I’m listening to a song that goes “Dies illa, dies irae, calamitatis et miseriae, dies magna et amara valde.” (new window warning) Roughly translated it means, “That day, when the server died, and fsck couldn’t recover the data, but unlinked all my files, that day all my data died, and I didn’t know where my backups were.”
If you have a dedicated server, it’s almost a certainty that you’ve never seen your server, or have even been in the same city as your server. Thus it is almost a certainty that you can’t show up in person to change tapes and take them offsite. Yet, it’s also probably a pretty good bet that your data is so critical to you or your business that it would not be too much of an exaggeration to say losing it would be a calamity.
How can you ensure your data is safe?
Well first, like security, your backup strategy needs its own plan and its own budget. You can’t toss firewalls at your server to make it secure; neither can you copy files off into the void. There are a number of issues that need to be addressed: what to back up, where to store the data, how to back it up, and when to run backups.
Some of this might apply if you’ve got a colocation account somewhere, but a lot of it is completely different. Physical access to a machine can change circumstances (as well as what services an ISP provides — at ServePath we are far more hands-on with our (unmanaged) dedicated customers than we are with our colocation customers).
What to back up?
If you’re paranoid, you might say “everything”, but this really doesn’t make any kind of sense if you’ve got a dedicated server. Since you can’t get to the machine, you’re never going to interact with it unless it has a complete install of some operating system on it. This means that file system-level utilities (such as my all-time favorite, dump (nww)) are pretty useless unless you architect your server’s layout around them.
When deciding what to back up, you have to keep in mind how you’re going to restore. If you simply tar up /, when you restore you’re going to scribble all over system files (and tar has issues with certain kinds of files as well) with old versions. If you were backing up Debian but restoring to FC4, you could easily render your system unusable by restoring to it!
What you want to grab are all your data files, and leave the rest to hang. Data files are actually split into two groups, configuration files and actual data files. Config files live in places like /etc and /var/named. If you can, try to avoid putting them there; use /root or something under /usr/local, such as /usr/local/etc or /usr/local/var. This way you don’t find out, when restoring from a crash, that your 2500 line custom Sendmail script was under /etc, which you didn’t bother to back up. Data files can live in some funny places, but if there’s a service (such as httpd) that owns them, it’s best to give them a user (such as http or www) and put all the relevant files under this user’s home directory. This doesn’t have to be /usr/home/www. This helps to keep track of where everything is.
If money (and therefore space) is an issue, you can decide what can be risked. I don’t back up any of my media files, since the chance of losing them is very low, the cost of losing them is low, and the cost of backing them up is high. If you have a bunch of open source repositories on your site because you like to help out SourceForge or Fresh Meat, then don’t bother to back them up, because you can just download them again. If you have a bunch of open source repositories on your site because your business involves them, as one of our customers’ (nww) does, then you should be backing them up because the time it takes to download them is more expensive than the cost of backing them up.
Where to store the data?
Dedicated customers are in kind of a tricky position here. The answer to this question is almost an unequivocal “offsite”, but if you have enough data even a 100Mbps pipe can’t finish the first backup before the second begins.
Fortunately, you don’t need to pull your backups offsite every day. You can automatically perform remote backups weekly or even monthly, depending on your data, its temporal value, and the rate at which it updates or changes. For example, a group of servers that powers a small online store can backup the transaction database nightly, but the contents of the store itself probably don’t update nearly as often, and the actual scripts that run the store are even less critical. All need to be backed up, since without any one of them the site is offline, but they have different priorities.
The whole reason for pulling backups offline is so that if there is a major disaster, recovery is still possible. ServePath itself is fairly secure (impressively so, I’ll blog about it sometime), so a disaster big enough to destroy your onsite backups and your server would probably level this building and most of the rest of San Francisco. Still, if it happens, you still want your business to be able to pick itself up. So take the backups offsite at least once in a while.
Where to keep the data while it’s here? I want to take this time to clear up a misconception about RAID. RAID is not backups. They do not protect against the same kinds of errors. Backups provide crude protection against mistakenly deleted files, hackers, corrupt file systems, and hardware failures. RAID provides graceful protection against hardware failure, sometimes. Sometimes RAID simply improves performance. You can’t assume that RAID 1 (mirrored drives) will save your data, because if there is some other hardware failure, like the power supply, and your file systems are corrupted, then both drives are gonners. What RAID does let you do is to notice that a drive has failed, and then schedule downtime to replace that drive. It turns uncontrolled downtime into controlled downtime. Usually.
ServePath really offers three choices for onsite backups: a second drive in the same server, a second server which mirrors the first, or space on a NAS drive. I’m not sure what the standard offerings among dedicated server hosts are but I’m sure they’re much the same.
A second drive is a pretty good idea if you’re most often recovering from deleted files. Transfers are quick, both for backup and restore. However, if the server is offline and you need access to the data, you’re out of luck. And since really anything that affects the first drive can affect the second drive, too, it is not entirely safe. Very recently I had to help a customer restore his OS after a script ran “rm -r /usr”. Fortunately, he had other backups.
Something like a NAS prevents most of the troubles that a second drive can run into. Since data can be copied via rsync or ftp, it allows archives to be made without exposing them to the file system for errant scripts to destroy. Since it is on a LAN link with the server, backup and restore is relatively quick. This can be thought of as “semi offsite,” since even if your server explodes, your data will probably still be okay (depending of course on the energy in the explosion and where your computer is in the data center).
A second server is the most expensive but also the most reliable solution. With proper load balancing configured, any single server could die and your customers wouldn’t even notice, because the other servers would pick up the load. As long as all servers in the pool have all the data, no single failure should affect the rest.
Okay this is getting long. I’ll return to it next week.
Recent Comments