Don’t Trust Your Computer.. Or Anyone Else’s

      by Wyatt Walter

Last week was a bad week for tech people all over the world. There were outages reported at several major datacenter providers, including the likes of Rackspace, Equinix, and Google. As any good system administrator will tell you, what goes up will come down. Having a contingency plan is critical to any business that relies upon its web services (who doesn’t anymore?). Any person who spends any time working on computer systems knows that you can’t trust them any farther than you can throw them. You have to keep a close eye on them and monitor them closely. Somehow, they always end up surprising you.

Fortunately for me, I wasn’t directly affected by any of these outages. Small as it is, my blog stayed online as it isn’t hosted in any of those datacenters. Unfortunately, I did have a bit of an issue last week and it went on for some time without being caught. I’m not sure that my issues were related to any of these major outages, but one of the plugins I’m using makes a call to an external system that had some problems. I’m not going to point out which it was since it could have happened to any of them, but the lesson that I learned is worth mentioning.

What Happened
I use a plugin that pulls data off another system. That system was down which caused page load times to skyrocket. During the day I was checking my AdSense account noticing that pageviews were quite low, but didn’t really have time to look into it. Later that night I saw that pageviews were still really low so I decided to look into it. The WordPress backend of my site was pretty snappy so I went to view the frontend. The page took forever to load. My site is on a shared web server, but I just so happen to be the system administrator at that hosting company so I quick logged into the shell of the web server. Loads were normal with no I/O wait, low CPU usage, and normal memory usage.

After checking loads of my own systems – something that’s already monitored, but I checked anyway – I popped open Firebug and reloaded the page. Bingo. I found the culprit. It was the service I talked about earlier that was slow to respond. I disabled the plugin and the site worked again immediately.

What I Learned
While my site is extremely small and I would not invest much time in this at this point, as you add external services to a site, it’s important to monitor those services as well as your own. Just because you watch that services are alive and responsive on your own web server, the services on others’ web servers are just as important as they can have a detrimental effect on your services as well. Of course, there’s nothing that you can do to control those services outside of SLA’s, but in my case, if I knew about the outage I could have simply disabled the service to minimize the effect on my site. Okay, I know I wasn’t losing millions of dollars or pageviews. I barely lost tens of page views over the whole thing, but as software and web services start relying upon ‘cloud’ services, keeping a close eye on all of those services is key.

Tags: ,
Filed under Tech Trends : Comments (0) : Jul 7th, 2009