Wednesday, June 14, 2006
Yeah, I kind of saw this coming
Yet more reasons I hate control panels.
I had to move a site last night from one server, hades
to another server, marion
,
and both servers are running Insipid
, which has a backup
and restore feature. So the plan was to backup the site from
hades
, then restore it on marion
.
It took several attempts on my part to get this process to work. One has
to realize that when Insipid
says the operation was
“sucessful” all that was “sucessful” was your request for the
operation—the operation itself is a separate process that will notify you
of the actual sucess (or failure thereof) via email.
Hate hate hate.
Now, one aspect of this site is that it has its own IP address. And lo', once I got the
site on marion
, marion
was listening in on the
IP address.
Good.
But hades
was still listening in on the site as
well.
Bad.
Not wanting to actually remove the site from hades
until I know this is working, I then decided to manually remove the IP address (not knowing how one even
approaches this using Insipid
).
ip addr del XXXXXXXXXXXXXX/24 dev eth0
Try to view the site, and the request is going to hades
.
Okay, the switch it's on is probably still sending traffic to
hades
. Clear the ARP cache on the switch.
Try to view the site, and the request is going to hades
.
Check hades
and see that it really wants to hold
onto that IP address. Use both
ip
and ifconfig
to nuke the IP from hades
.
Try to view the site, and the request is going to hades
.
Clear the ARP cache on the switch.
Try to view the site, and the request is going to hades
.
Okay, shut down the port that hades
is plugged into on the
switch, clear the ARP
cache.
Try to view the site, and the request is now going to
marion
.
Good.
Now, that was late last night (between 3:00 and 4:00 am during which I stupidly answered the phone and took a tech support call around 3:30 am dealing with an email issue—sigh).
This morning, requests for said site are now going to
hades
.
Hate hate hate hate hate.
Okay, nuke delete the site from hades
, make sure
it doesn't have the IP address,
shutdown the port it's plugged into on the switch, clear the ARP cache on the switch and
okay, requests are now going to marion
.
I even double check to make absolutely sure that no other sites are on this IP address. There aren't.
Hate hate hate hate hate.
I know why I hate control panels—I don't feel in control. And when something breaks, I have no idea how to fix it. Oh, I typically know what's wrong, and how one could fix it, if one weren't running a control panel. How to fix it within the context of the control panel? That, I don't know (oh, I suppose one could dive into the internals of the control panel but a) that kind of defeats the purpose of a control panel, which supposedly makes Unix administration easy and b) we use three or four different control panels, which all work differently, which means we need to become experts in using all these control panels which again, kind of defeats the purpose of a control panel. Either that, or I'm bitter that all my experience in administrating a Linux system is no longer applicable and that I have to relearn all this crap four new times, just to administrate a Linux system).
Hate hate hate hate hate.
- Subject
- No subject
- Posted-by
- Sean (Staff)
- Date
- 06-14-2006 3:25am EDT
Moved site to
marion
. It”s disabled onhades
and hopefully,hades
won't try to reassert the IP address.Response to trouble ticket last night after moving the site
I fully expect that in a few hours, I'll have to revisit this situation again.
An hour and a half later …
hades
took control again. This time, we found a process, ntpd
(Network Time Protocol—which keeps the clocks on all the servers in
sync) had explicitely bound to each IP
address on hades
, as well as Apache apparently still configured for the site in question (then what the
XXXXXXX XXXX good is Insipid
if it doesn't restart Apache?).
Okay, maybe now hades
won't take over the address.
Update three hours after the previous update
It's happened twice since the last update. Short of rebooting
hades
the next time it happens, I can't think of what else
might be causing it to respond to an IP
address it's no longer programmed to respond to.
Update about half an hour later
Found the script buried in /etc
that had the IP address and nuked it. That seems to have taken
care of the problem.
For now.
“It was my rerouting to circuit B that saved him.”
[We've gone back and forth several times today about this issue. This is the latest message in the trouble ticket system. I'm not trying to make fun of the customer, but man, it is so tempting … —Editor]
Our clients can not update their sites and they are suffering … Can't you route the communication to that server in a different way?
One of our resellers
Dear XXXXXX,
You know, you're right. We haven't tried rerouting FTP packets to the server. Let's see … we can peel off the FTP traffic from the HTTP traffic at the Primary Intarweb Interface Matrix and feed it through the secondary sub-intranet bridge instead of through the primary LAN tokenizer substation hub … hmm … while the FTP traffic is now going to the secondary sub-intranet bridge it's not getting past the IEEE 802.3 CSMA/CD packet scrubber. Odd.
Okay, instead, let's see if I can reroute the traffic from the secondary sub-intranet bridge through the dual-Φ power transformer at … let's see … probably need to make sure the bits are phased at 60Hz at 51.5° (to induce harmonic pyramidal overtones), making sure to bypass the GFCI (dangerous, I know, but if you know what you are doing, and are very careful, it should be okay) and then tap the secondary coil on the server in question, couple it to the APCI DMA via the unused SCSI controller and then finally into the TCP/IP stack … and—
Cool!
It worked!
Try it now!
Now, off to repair the primary LAN tokenizer substation hub.
[And no, despite how tempting it might have been, I did not send this as a reply, but instead restarted the FTP server.]