Thursday, March 31, 2005
The magic switch
I called another hacker over to look at it. He had never seen the switch before either. Closer examination revealed that the switch had only one wire running to it! The other end of the wire did disappear into the maze of wires inside the computer, but it's a basic fact of electricity that a switch can't do anything unless there are two wires connected to it. This switch had a wire connected on one side and no wire on its other side.
It was clear that this switch was someone's idea of a silly joke. Convinced by our reasoning that the switch was inoperative, we flipped it. The computer instantly crashed.
The machine was incommunicado anyway.
We couldn't ping it, and it didn't respond when we plugged a keyboard into the machine. Dan, the network engineer had been wanting to move the machine anyway, since a colocated server shouldn't really be in the core router room (it was there because at the time, that was the only space for it). I figured why not? We'll be rebooting it anyway. Dan noted the phone number on the front of the computer (not a bad idea actually) and said I should notify the customer about the situation.
“Hello, this is Sean from—”
“I don't speak English,” said the woman's voice on the other end of the phone number. Perfect English, a slight hint of an accent I couldn't quite place.
Instinctively I raised my voice, because, you know, an increase in volume always brings about an increase in language comprehension. “Is there anyone there that speaks English?”
“I don't speak English.” Click.
Um …
Right.
Smirk's on vacation for the week, and only he knows his accounting system enough to plumb customer phone numbers from it. All I had to work with was the one scrawled on the front of the computer, and hey, I gave it a shot. This machine needs to be moved, then restarted. The move itself went quickly, even though the machine was an ungainly large tower based system. Plugged in the network cable, power, monitor and keyboard from the crash cart, and hit the power switch. Power supply fan started, then stopped—nothing. Hit the switch again. Power supply fan started, then stopped nothing. Went around to the front of the machine to see what might be going on. Hit the switch. Power light went green, the CD-ROM light flashed briefly, as did the floppy drive. Then all went dark.
XXXX!
Checking the machine, I found a switch on the back of the machine. So for several minutes I tried various combinations of switches. Then I unplugged the network card. Then the keyboard. The machine finally powered up.
Whew!
Um … maybe.
The machine came up with this overly large Hewlet-Packard logo and just sat there. I wanted to nervously tap the Caps Lock key, but then remembered, the keyboard wasn't plugged in. So I just stood there nervously tapping the keyboard anyway. After three or so minutes I saw the familiar “Lilo” prompt, followly quickly by Linux booting.
I then stood there watching disk partition after disk partition being scanned. Two disks, each appeared to have seven or eight partitions. After about ten minutes the screen blanker kicked in.
Okay, I unplugged the monitor and put the crash cart away—there wasn't much I could do at that point and I figured it was another few minutes it would be up and chugging away serving whatever it was it served.
A bit later the customer called, saying his machine was down. I should have come up by now. Hooked the crash cart back up to it, keyboard still wouldn't respond and the screen was still blanked. Nothing much else to do but reboot it.
Okay, wait for three minutes looking at the Hewlet-Packard logo, then
Linux is booting and Oh! fsck
needed to be run manually
on one of the partitions and it's asking for a root password.
I don't have a root password.
Okay, let's try booting into single user mode.
Nope, wait three minutes for the Hewlet-Packard logo, boot into single user mode and still need a password.
Okay, let's try using a rescue CD and manually check the partition.
Yes, waited three minutes and the CD worked.
Okay, take out said CD, and reboot the machine. Wait three minutes, see Linux boot, and the network interface failed to initialize.
XXXXXXXXXXXX!
Put the CD back in, then shut the machine down and try initializing the network interface using the rescue CD to see if it's a hardware or software issue.
Note, this time, instead of just resetting the machine, I power cycled.
Well, power downed. It wouldn't power back up.
And by now the customer was on the phone asking what had happened.
I told him, and asked him if there was any special procedure I needed to follow to get the machine up and running.
He wasn't aware of anything special, other than making sure the switch in the back was in the “On” position, and toggling the front button.
This was not good.
Customer suggested I unplug the CD-ROM. Which meant I had to open the computer. Which meant I had to remove the top to remove the side to remove the power from the CD-ROM. The side of the computer case had to slide all the way out. Not just back a bit then fall to the side. Nope, it had to sliiiiiiiiiiiiide all the way off. Not an easy thing to do while it was still in the rack.
Still wouldn't power up.
So, remove power to the floppy drive.
Still wouldn't power up.
By this time, I had both P and Dan crowded around the machine, and Dan noticed that the power supply for this huge system with dual harddrives and a CPU the size of Rhode Island only had a 250W power supply.
Okay. Swap in a larger power supply.
Now, mind you, I was on the phone with the customer while all this was going on—I had a mobile phone with a headset.
Take the machine back to the rack, power up.
Hallelujah that worked!
Machine powered up, sat there at the Hewlet-Packard logo for three minutes, booted into Linux and still refused to initialize the network interface. And now, I realized that the CD I wanted to boot was stuck in the still unpowered CD-ROM (the new power supply didn't have enough connectors on it to supply power to the CD-ROM).
Next scramble, new network card, in a different slot than the old card.
Now the wait was five minutes, and it still didn't work.
Next scramble, found a network card that was the same model as the one that was originally in the machine.
Wait five minutes, boot and hallelujah it initalized the network.
Only the machine was in the middle of the floor, with its cover off.
So the customer logged in, shut it down, so we could put the covers back on and slap it into the rack.
Wait five minutes—
—and the network card didn't work.
At the customer suggestion, I removed the covers, took out the replacement card, held it for a few moments, then put it back in. Powered on, waited five minutes, saw the system boot and—
—the network card was working.
“I'm not putting the covers back on. I don't want to touch the machine,” I said.
“I don't blame you,” said the customer.
I'm not touching that machine again.