Man tuning(7)

Views: 1,066 Managed Services, Network, Security, Software No Comments »

So I’ve been thinking lately about the title “System Administrator.” This is our official job title (it says Systems Administrator on our business cards–I guess the extra s is a nod to the fact that we have some 2500 systems in the data center). This is a slightly misleading title, however. I’m not really a system administrator as much as I am a system medic. I only see servers when they’re sick, I do whatever it takes to fix them as fast as possible, and I (hopefully) never see them again.

From what I’ve seen (working for ServePath, but actually far more often on IRC), people tend to think this is what a system administrator does.

It isn’t.

Just because a server is online doesn’t mean it is properly administered. This is akin to saying that if you’re alive, you must be healthy.

There are two very broad areas a server needs to be tuned for after its services have been set up, security and performance.

If you’re a sysadmin for, say, a FreeBSD server, some questions I might have regarding security are

  • Do you know what version of SSL/SSH you have installed? Do you know whether you need to upgrade? Do you know how to upgrade these without breaking anything?
  • Do you know what ipfw is, and how to use it?
  • Do you know what pf is, and how to use it?
  • Do you know what termlog is, and do you use it? Why?
  • What logs do you keep, and where do you keep them?
  • Do you know what a jail is, and should you be using them?

For performance,

  • Which processes take up most of your resources, and which resources (disk I/O, network, CPU, etc)?
  • At what point is a process taking too many resources?
  • Do you know what inodes are? Do you have enough? How would you get more? (I had a client run out of inodes on two different file systems.)
  • Do you know why /usr, /, /tmp, and /var are all on separate slices by default? When might you want to change this?
  • What would you do if directories are taking a long time to list their contents?
  • What network services do you run, and what kind of network performance do you get? How could you adjust your network buffers to get better performance? What about your firewall rules?
  • Do you know what RFC1323 is, and when you’d need what it specifies?

Ultimately a server needs rather a lot of attention to be performing well and be secure. If you just turn a server on and plop it online, you’re probably not getting out of it all that you could.

And you’re also probably hosting movies for kids on IRC, even if you don’t know it.

ServePath Withstands Massive DoS Attack

Views: 219 Network, Security No Comments »

This morning at approximately 10:00 AM EDT Hosting Matters was hit by a massive DoS attack apparently directed at several conservative blogs hosting with them. Hosting Matters announced in their forum that a specific web site was the target of the attack but did not identify the target by name. The web site in question was isolated from the rest of the Hosting Matters network but the attack still managed to affect several conservative blogs such as Instapundit and LittleGreenFootballs. According to blogger Michelle Malkin, who has been the target of previous DoS attacks but was not affected by this one, the targeted blog was Aaron’s CC. Most of the blogs affected seem to be back up and running now, unlike the original target.

According to reports, the attack originated in Saudi Arabia, although that doesn’t necessarily mean that the perpetrators were Saudi, just that the computers they hacked were situated in Saudi Arabia.

At ServePath we were able to deflect a similar attack thanks to our Riverhead Networks equipment which did an excellent job protecting our network and our customers. At about 8:30 AM PDT we started getting alerts which indicated an inbound flood of over 500,000 packets per second. Fortunately, the Riverhead system took over the incoming flow to the attacked IP and started mitigation pretty much immediately. The attack is still going on as I type this, but seems to finally be tapering off.

“Sing, O Muse, and tell of the man skilled in all ways of contending…”

Views: 298 Network, Software No Comments »

Before I worked here I wasn’t exactly sure what this kind of job consisted of. Of course this made for a very awkward résumé, since I had to get across that I knew a lot about computers without implying I was best suited for a completely different computer job, such as database management. But even for a while after I joined, there were moments when I was surprised, because what I had been expecting wasn’t anything like what actually happens. So I imagine there may be a lot of you who are speculative or curious about what kind of issues can happen in a data center.

All the issues that I have to deal with can be split into three categories, hardware, software and networking. Since we sell only unmanaged servers and colocation, ideally I’d only be responsible for hardware and networking (a fourth category, environment, is important but not in any way under my purview). However, operating systems being what they are, those things break all the time and of course I have to fix them.

Hardware

The hardware issues we see most often are bad RAM and bad hard drives. Very rarely we have bad RAID cards or NICs, and once I had to replace a CPU. These are always fairly easy fixes, once the problems have been identified. The only real issue is when a customer loses data due to a failed hard drive. RAID can sometimes (but not always) prevent that, which is one of a million reasons why backups are so necessary.

Under hardware I’m also going to throw all the scheduled upgrades we do. Fooling with hardware is the easiest part of this job, except for inventory management, which gets tres annoying, but even that isn’t so bad. This is the same monkey stuff you did for your family when you were 14.

Network

I don’t think I’ve seen the network actually break, but customers fall off it all the time. 95% of the time, this is because of Red Hat Linux. Oh man, do I hate Red Hat. Don’t take this personal, if you like, use, work for, or are Red Hat (well, take it personal if you are Red Hat), but the network configuration in this OS is such a mess. So if you use Red Hat, and you reboot, and suddenly you can’t get on the network, it’s because the network scripts, which used to work just fine, thank you, decided they didn’t like where the default gateway was defined, and now expect it to be defined in another of the 735 different network configuration files, which lives in another directory from the file previously used. Haha!

This is, of course, only my opinion.

Usually network upkeep involves protecting our network from customers. If customers get cracked, they tend to become members of zombie networks, and the UDP floods they send out can slow things down for other customers. We tend to head those off by limiting the compromised customer’s connection.

Less often, but not rarely, customers become victims of DoS or DDoS attacks. In fact, there’s one going on right now. If you happen to know 208.185.250.11, tell him I said to knock it off. There are nearly always handled automagically by our network infrastructure, but it’s good to keep an eye on it.

Software

Oh boy. Broken software. Where to start?

Well, there are the usual suspects. By default, Windows will only allow two active Terminal Services sessions at a time (Windows 2003 allows you to connect to the console remotely, which can count as a third session). If you run out of these, and Windows doesn’t reset them for some reason, We have to visit the box to reset them manually.

Control panels have been known to become unstable. This seems to happen when a user tends to be familiar enough with the command line to use that, but also has a control panel installed. The CP can become confused if a file is edited manually. This is why Ensim (for example) changes the motd to inform users that, if they edit files, they have voided their warranty.

Remotely upgrading OSs is also a tricky issue, for example kernel upgrades.

Then there are the day-to-day surprises, like that time up2date got confused and uninstalled OpenSSH.

So there are a myriad of different software issues that actually crop up, but the best way to classify them would be in two categories: those that break the OS and access to it, and those that break the services the server provides. We probably have an 90/10 split between them. Very rarely will we get involved in customer setups; our customers generally prefer to have their own IT staff take care of it.

In a way it’s almost disappointing that we don’t get to do the real Sysadmin work (that is, configure client servers with actual solutions to actual problems, instead of just making sure they’re online). But that would be impractical for the number of clients we have, and they’d basically be paying for our on-the-job training as we learned about their (unique, sometimes bizarre) setups. So probably it’s just as good we don’t.

New Network Operations Center (NOC)

Views: 897 Data Center, Network No Comments »

If you have been in our San Francisco data center recently, you may have seen 21 large screens being installed on the wall of our support area. Our new Network Operations center is now functional, and new monitoring systems are coming online to fill up the screens in the coming weeks.

The new ColoServe NOC builds on our existing internal and external monitoring by providing a state of the art way to display real time information for all ColoServe systems. Monitoring and alerting for all our network connections, power circuits, UPS systems, and air conditioning units and more will be displayed on the wall, and new screens have been added for special customer server and application monitoring. The ColoServe NOC is staffed 24/7 by a minimum of two systems administrators to provide our colocation customers with the highest level of service and technical expertise possible.

Network Upgrade Completed

Views: 861 Network No Comments »

Last night ServePath completed the Phase II Upgrade of our edge/border routers from Cisco 12012 to Cisco 6509 routers. Phase I was completed on July 24th, 2005. Servepath’s Network team is constantly working to improve our infrastructure, and this latest upgrade is a major milestone.

The new routers include Cisco supervisor 720 modules that will provide enhanced hardware-based IPv4 and IPv6 and higher performance and scalability. The SUP-720 BXL is a third generation supervisor engine that addresses our growth and growing data-plane requirements. With its integrated switch fabric, the SUP-720 allows for more line cards per slot and the 40 Gbps per slot supports high-density Gigabit Ethernet and 10 Gigabit Ethernet ports. Servepath will now be able to take10 Gigabit handoffs, satisfying future bandwidth upgrades from our upstream providers. as our data center expands and our dedicated server customers demand increasing levels of bandwidth scalability.

ServePath Adds Another Fiber Provider: IP Networks

Views: 206 Network No Comments »

IP Networks ServePath has upgraded our private and public peering connections with the installation of new connections and equipment from IP Networks. This is the sixth fiber provider to install equipment into our San Francisco data center.

IP Networks operates a next generation optical access network that provides high bandwidth Ethernet services to hundreds of the most important buildings in the San Francisco Bay Area. Their architecture provides superior transport layer redundancy using MPLS and eliminates the bandwidth bottleneck in the last mile of many traditional telco circuits.

The new connection gives ServePath yet another Gigabit transport connection, as well as greater peering capacity so that routes to the most popular sites and ISPs are dramatically reduced.

ServePath’s Screaming-Fast Network Architecture

IP Networks, Inc.

ServePath Announces New Network Peering at PAIX

Views: 209 Network No Comments »

ServePath is now peering with more than 37 other networks at Switch and Data’s PAIX (Peering And Internet eXchange) in Palo Alto. PAIX is the largest commercial, neutral Internet Exchange point in the world, where major Telecom carriers, Internet providers, and content networks exchange Internet traffic. ServePath is currently peering with networks such as Yahoo, Microsoft, Cogent/PSInet, Akamai, UltraDNS, CENIC/CalREN2 (University of California system and Stanford), and Japan Telecom.

WP Theme & Icons by N.Design Studio & modified by ServePath.
Entries RSS Comments RSS Log in