Dell Enterprise Support - Getting Better?

I used to buy Dell machines with the expectation that if there was a problem with them I was essentially on my own. For the price, that was OK and I bought them when that thinking was appropriate.

Very early Tuesday morning one of my inbound mail servers (a Dell PowerEdge 840) stopped responding completely. Since its failure wasn’t a critical problem (there are backups for everything that machine runs) I didn’t get to it until about 6pm Tuesday. It turns out the power supply had failed (the PS fan wouldn’t turn on and the PS put out no power at all). This particular server has been on constantly since new for just shy of 2 years. On Saturday it was off for the first time (due to an extended power outage) for 12 hours. I guess it got used to being on and failed in protest of being turned off.

Not in the mood for phone support hell, I tried Dell’s on-line support chat around 9pm. In 9 minutes I had a dispatch and case number and a power supply on the way. A tiny bit slower than the same conversation would have taken with HP but it was a simple “my power supply is dead, I’ve already tested it, it’s shot, send me a replacement”… “OK, it’s on the way” as I’m accustomed to with HP and not so much with Dell. So I’m happy, maybe I’ll start buying more Dell servers for semi-critical infrastructure.

Wednesday at 9:40pm I get an email from the support rep I had chatted with the previous day saying “the power supply is back-ordered, you should get it *next* Wednesday”. I reply that “a week for a power supply is a joke, HP would have one to me next day or the day after at the latest”. He responds, “Understood, I’ve noted your comments in my notes”. At this point I’m never buying a Dell machine for any non highly redundant server system again. The next day at work I spend $10k on some HP equipment that I was considering going with Dell on.

Shortly after buying the HP equipment Purolator shows up with my replacement power supply from Dell. It’s a refurb (like HP would be after 30 or 90 days), but it works and the server seems to be OK.

So, now I’m not sure if I’ll be buying Dell for non-redundant systems. It seems their support is getting up to par with HP but they have no idea about when you’ll actually get replacement parts (unless they do and complaining made it show up faster). I’ll likely keep buying Dell for grid computing clusters simply on a price basis. A failure of a cluster node is far less important than price and operating cost. Anything else though… I have no idea about Dell… I know HP is safe.

I spy a brownout

112 volts AC, down from a normal 119 volts, around 4:50pm to 5:30pm on August 5, 2008. First brownout I’ve noticed in a good while.

August 5, 2008 brownout

New Projection Lenses at Midland Drive-In

As of Saturday night I’ve installed new lenses at the Midland Drive-In. The new ISCO lenses replace the probably 40 or more year old lenses which included Ultra Panastar adjustable anamorphic lenses. Bottom line… there have apparently been some improvements in projection lens technology in the last few decades (go figure!). The new lenses result in a brighter, sharper, more colourful picture. More importantly, I can now focus the anamorphic (cinemascope, scope) picture left to right all the way across the screen (the old right projector scope lens couldn’t achieve anywhere near decent side to side focus).

Added to the new digital FM stereo transmitter I installed before we opened this year and the DTS Digital Sound I installed in late August last year, we’ve probably got one of the best combined picture and sound presentations of any drive-in in the country. We’re definitely not your father’s crappy sound, lousy picture drive-in. The drive-in is one of my hobbies and my hobbies aren’t crap!

Scotiabank ScotiaOnline Pending Transactions Database Inconsistencies

Scotiabank’s ScotiaOnline online banking application has a major flaw. If you schedule a bill to be paid in the future (it’ll be listed under “Pending Transactions”) and then after 6pm on the day that transaction is to occur delete that pending transaction it’ll still be processed as if you hadn’t deleted the pending transaction at all. This makes it clear that they’re not using the database that you see and the database they are using to actually process transactions can fall out of sync (many hours out of sync in fact).

I found this out on Friday when I deleted a pending transaction… a Visa bill payment for a few thousand dollars that was to come from my line of credit. Instead, since I had the cash, I deleted the pending transaction and paid it from my chequing account. The next day I found that the deleted pending transaction went through anyway… so Visa (well, TD Bank) got another few thousand dollars from me (I payed off my entire balance twice). Thanks Scotiabank… I didn’t have a use for a few grand anyway. Yet another reason why I probably shouldn’t bank at Scotiabank.

$3 fan = 7° Celsius

Back when I was in university I had, one day, a customer that went from no rush, whenever, to can this application be live in three hours. The application required a decent server and a couple megabits of bandwidth. I didn’t have any machines in a data center at the time with both enough power and bandwidth to support it, so I decided to host it from my apartment for a few days (which turned into a couple of months at the customer’s request). Luckily my Cogeco Cable Internet connection in Hamilton was actually of a decent speed for a decent price (unlike Rogers‘ current quite expensive offerings, I don’t know if Cogeco has kept their more reasonable pricing).

Anyway… scrounging together some parts found throughout my apartment I built a machine with a couple gig of memory, a 2 GHz AMD processor, along with an IBM Deathstar, a Maxtor Diamond Max and a Western Digital Caviar (which I figured would guarantee I’d see at least one drive failure) and a few of my favourite cheap ($7) low speed PCI network cards, the D-Link DFE-538TX (one of D-Link’s few good products). The only thing I didn’t have laying around was a rear case fan. Oh well, I’ll add one later. Fast forward over half a decade.

To this day I’m still using the machine to host customers’ network applications during development and initial use and it serves as my primary personal mail hub. I picked up a case fan for it long ago. I finally installed it tonight along with a cheap gigabit ethernet adapter (an Intel PRO/1000 GT).

To the point, I like to graph hard disk temperatures in all of my machines. I’m paranoid. Here’s the before and after (the $3 80mm case fan dropped the temperature of all three drives by 7 degrees celsius — well worth the price to reduce the chance of drive failure):



Microsoft Antigen: Brain Dead Content Filter

Microsoft Antigen for SMTP found a message matching a filter. The message is currently Purged.
Message: “Re_ down_”
Filter name: “KEYWORD= profanity: piss”
Sent from: “Daryl C. W. O_Shea”
Folder: “SMTP Messages\Inbound”
Location: “psp/TRACYSV05″

Piss! Oh noes! The utter profanity of replying to someone who said “Am I blocked? Did I piss someone off?” is simply unacceptable.

I’m simply amazed at the number of rejections I get from users of Microsoft Antigen for SMTP (part of the Microsoft Forefront Security product family) based on single words that I learned during my years in Catholic schools. I think the Forefront Security product family has no forebrain. It’s no wonder why most content filter based anti-spam products have such a bad wrap.

US Spy Sat won’t be going on eBay

Reuters is reporting that the US Navy says they hit their failed spy sat three hours ago (at 22:26 PM EST). I guess it won’t be going on eBay as others had hoped.

Mail::DKIM v0.29 slow? Upgrade to v0.30.1

My nightly SA mass-checks have been hanging up this week on a 1MB email (not sure how a 1MB message got in my mass-check corpus, but that’s not important). It turns out that it was Mail::DKIM v0.29 that was taking about 150 seconds to process the message, while the rest of SA was only taking about 10 seconds. Upgrading to Mail::DKIM v0.30.1 resolves the problem… the DKIM check is fast (I didn’t time it, probably under a second).

The speed-up may be due to Mark Martinec’s optimizations in v0.30. It could be that the optimization was just to not do the crypto on the body, though, since the message in question did not have a signature (the sender doesn’t sign mail).

Make sure your Hadoop cluster nodes are registered in DNS

One thing I’ve forgotten, twice now, to do before attempting to run a job on Hadoop clusters that I’ve setup in a hurry to demo something is to make sure that all of the nodes are registered in DNS or at least have entires in their hosts files about every node (datanodes, the namenode, the master and all the slaves) in the cluster.

If you start out with the master not knowing what name the slaves’ IPs map to the slaves won’t be able to connect to the master, even if you use IPs in the conf/slaves file. This seems silly to me, but that’s the way it is, at least as of 0.14.4. You’ll discover and fix this first. The slaves will then connect and the first level of links will start to work in the the master’s web interface.

Now the TaskTrackers on the slave nodes will successfully run tasks and will probably complete the map stage. If the nodes have varying performance levels or your data isn’t well distributed on your HDFS file system the map stage may appear to hang (or repeat the same percentage(s) over and over). If you make it through the map stage the reduce stage will fail to complete, for the same reason the map stage may fail, if you didn’t also configure each of the slaves nodes to know the names for all of the other slave nodes. As soon as a slave node needs data off of another datanode (either for a map task or a reduce task, etc) it’ll face the same problem it initially had in contacting the master node (but this time other slave nodes) before you configured DNS for it.

So… make sure that all machines involved in the cluster know the hostname (and it’s IP) of every other machine in the cluster. Configuring just the master to know all the slave’s names/IPs or just all of the slaves to know the master name/IP will not work, you need both — even if you haven’t used a single hostname in either of the conf/masters or conf/slaves files.

Elected to be an ASF Member

In early December I was nominated and elected to become a member of the Apache Software Foundation (ASF). I’ve been contributing to SpamAssassin, an Apache project, since just after the release of 3.0.0 in October 2004 (and a committer since March of 2005). I’ve been using and working on SpamAssassin since Justin Mason’s first release back around 2001. I guess, so far, I’ve failed to do more harm than good and elected a member as punishment. :)

