Data Security and the Importance of Backups

Views: 1,027 Security No Comments »

“It is a truth universally acknowledged…”

Once, I licked the bottom of my foot.

I won’t claim that my youth wasn’t full of foibles, but I like to think I have avoided many silly mistakes and that, on the whole, I have fairly good judgment. I bet you like to think that, too, since I fix your servers when they’re down.

Despite whatever more rational faculties I may have, there are times when judgment is overcome by other things, (I hasten to assure you that this never happens with servers or networks, unless you count the time in college when I decided to see if I could get better network performance by setting my NIC to 100MbpsFD instead of 10Mb. It crashed the hub, and my dorm was without net access for a day. Who knew?) and it’s during those times that I can be found buying soda cans from vending machines to drop them down stairwells.

Anyway, the point is that once I licked the bottom of my foot. But I can understand if you read it wrong. If I was reading some random blog and came across that sentence, I’d think it said “Once, I licked the bottom of my foot.” In fact, what it really says is “Once, I licked the bottom of my foot.” Once was enough. For a lot of the silly things I’ve ever done, once was enough.

Why am I writing about this on a blog that is about a data center? Well, once, I lost all my data. I was repartitioning a drive, or resizing a partition, or doing something in Partition Magic, which is a useful program that will do a lot of low-ish level data organization on your hard drives, when the power went out. My disks were left in an unstable state and I lost everything. This was years and years ago now, but it is also the last time I have ever lost data. I’ve deleted files, had hard drives crash, installed OSs over old ones, but I’ve never lost another bit of data. Once was enough.

In a future post I’ll get some backup basics together; it’s actually more complicated than you might think. Like good security, it requires a number of trade-offs. But for now, I just want to request that the reader think very carefully about whether she has any data to lose and, if so, what kind of precautions should be taken to prevent such a loss.

“Sing, O Muse, and tell of the man skilled in all ways of contending…”

Views: 345 Network, Software No Comments »

Before I worked here I wasn’t exactly sure what this kind of job consisted of. Of course this made for a very awkward résumé, since I had to get across that I knew a lot about computers without implying I was best suited for a completely different computer job, such as database management. But even for a while after I joined, there were moments when I was surprised, because what I had been expecting wasn’t anything like what actually happens. So I imagine there may be a lot of you who are speculative or curious about what kind of issues can happen in a data center.

All the issues that I have to deal with can be split into three categories, hardware, software and networking. Since we sell only unmanaged servers and colocation, ideally I’d only be responsible for hardware and networking (a fourth category, environment, is important but not in any way under my purview). However, operating systems being what they are, those things break all the time and of course I have to fix them.

Hardware

The hardware issues we see most often are bad RAM and bad hard drives. Very rarely we have bad RAID cards or NICs, and once I had to replace a CPU. These are always fairly easy fixes, once the problems have been identified. The only real issue is when a customer loses data due to a failed hard drive. RAID can sometimes (but not always) prevent that, which is one of a million reasons why backups are so necessary.

Under hardware I’m also going to throw all the scheduled upgrades we do. Fooling with hardware is the easiest part of this job, except for inventory management, which gets tres annoying, but even that isn’t so bad. This is the same monkey stuff you did for your family when you were 14.

Network

I don’t think I’ve seen the network actually break, but customers fall off it all the time. 95% of the time, this is because of Red Hat Linux. Oh man, do I hate Red Hat. Don’t take this personal, if you like, use, work for, or are Red Hat (well, take it personal if you are Red Hat), but the network configuration in this OS is such a mess. So if you use Red Hat, and you reboot, and suddenly you can’t get on the network, it’s because the network scripts, which used to work just fine, thank you, decided they didn’t like where the default gateway was defined, and now expect it to be defined in another of the 735 different network configuration files, which lives in another directory from the file previously used. Haha!

This is, of course, only my opinion.

Usually network upkeep involves protecting our network from customers. If customers get cracked, they tend to become members of zombie networks, and the UDP floods they send out can slow things down for other customers. We tend to head those off by limiting the compromised customer’s connection.

Less often, but not rarely, customers become victims of DoS or DDoS attacks. In fact, there’s one going on right now. If you happen to know 208.185.250.11, tell him I said to knock it off. There are nearly always handled automagically by our network infrastructure, but it’s good to keep an eye on it.

Software

Oh boy. Broken software. Where to start?

Well, there are the usual suspects. By default, Windows will only allow two active Terminal Services sessions at a time (Windows 2003 allows you to connect to the console remotely, which can count as a third session). If you run out of these, and Windows doesn’t reset them for some reason, We have to visit the box to reset them manually.

Control panels have been known to become unstable. This seems to happen when a user tends to be familiar enough with the command line to use that, but also has a control panel installed. The CP can become confused if a file is edited manually. This is why Ensim (for example) changes the motd to inform users that, if they edit files, they have voided their warranty.

Remotely upgrading OSs is also a tricky issue, for example kernel upgrades.

Then there are the day-to-day surprises, like that time up2date got confused and uninstalled OpenSSH.

So there are a myriad of different software issues that actually crop up, but the best way to classify them would be in two categories: those that break the OS and access to it, and those that break the services the server provides. We probably have an 90/10 split between them. Very rarely will we get involved in customer setups; our customers generally prefer to have their own IT staff take care of it.

In a way it’s almost disappointing that we don’t get to do the real Sysadmin work (that is, configure client servers with actual solutions to actual problems, instead of just making sure they’re online). But that would be impractical for the number of clients we have, and they’d basically be paying for our on-the-job training as we learned about their (unique, sometimes bizarre) setups. So probably it’s just as good we don’t.

WP Theme & Icons by N.Design Studio & modified by ServePath.
Entries RSS Comments RSS Log in