Monday, January 26, 2004
Swapping disks
Tonight Mark and I
replaced a bad disk on swift
, the colocated server currently
serving up our sites. The bad disk is the system disk; the websites
themselves (along with some other services we have) all reside on another
disk.
There was much discussion before heading over there as to the best way to
approach the problem of copying the data off the bad drive. The first
method would to be install the new disk into the machine and do a
disk-to-disk copy. The downside is that swift
is a 1U system with no room for a
third drive (no matter how temporary). Also, the unit is designed to run
with the cover on—we were unsure how it would deal running uncovered. The
other option would be a network based copy, from swift
to another
machine with the new drive in it. The problem here was speed—even though
we could hook the second machine directly to swift
(on the
secondary ethernet port) at 100Mbps it would still take a while to copy over several gigs
worth of files. We decided to take a second computer (the Windows box Spring and I share) as
we decided to decide when we got to the colocation facility.
When we got there and examined swift
, it was decided to use the
temporary computer and do a network copy. We had some difficulty in getting
the Windows box to recognize the new SCSI disk (Mark had some extra SCSI controllers and disks); it was
certainly news to me that the BIOS setup was on the harddrive instead of on the ROM (much like the very old days of
PCs). Once we straightened
that out, it was pretty straightforward to boot Gentoo from a live CD, partition and format the new drive.
Then it was time to copy the files. It took some work to figure out how
to use rsync
using the rsync protocol and it still took us two
attempts to get everything (first time rsync
ran without root
priviledges which limited the number of files copied). Once that finished
(and still on the temporary machine) we recompiled the kernel to support
SCSI, then set
about to make the drive bootable.
The problem here was that Gentoo was a bit too aggressive in
identifying hardware, and since the Linux kernel sticks USB storage devices under the
SCSI layer, the
harddrive ended up with an ID that it wouldn't have in the swift
.
We ended up having to reboot the Gentoo CD, remove the loaded USB drivers, then mount the SCSI drive, then make the drive bootable. Once
that was done, the temporary system booted up without a problem.
We then removed the drive and controller, cleaned the area (so we could have room to move about) and spent a few minutes making a game plan of swapping the bad drive for the new one. The physical swap went fairly smoothly. It was reconfiguring the BIOS that proved to be rather difficult. We couldn't get into the BIOS configuration. A search of possible key sequences to get into the BIOS configuration revealed:
- DEL
- F1
- F2
- F10
- Ctrl-Alt-Esc
- Ctrl-Esc
- Alt-Esc
- INS
- Esc
- Ctrl-Alt-Ins
We ran down the entire list, and not one worked. Mark then had the brainstorm to hold down the keys as the machine was powered up. First key he tried, DEL got us into the BIOS.
Talk about having plenty of time to get into the BIOS configuration.
Once the BIOS was configured with the new drive, it rebooted without a problem.
All told, we spent maybe five hours doing the drive swap, with the websites unavailable for maybe fifteen minutes tops. It was a bit scary at times though, watching the copying go with numerous disk errors. But so far, nothing important seems to have been corrupted, unlike most of the files in Mark's home directory (but he had current backups of that data anyway).