Monday, Debtember 27, 2004
User expectation
Have I told you how much I hate—
There are about five different trains of thought running through my head about this right now, but it would take some serious effort, and a lot of writing to explain all the thoughts (touching on such diverse subjects as data storage, data organization, configurations, RISC (or rather, a RISCesque approach to service configurations) and process management, in addition to these XXX XXXXXX XXXXXXX control panels.
But alas, I think I'll try to limit myself for now to just control panels.
Now, it may be that it's just this particular control panel we use here at The Company, but my limited exposure to two other control panel platforms hasn't left me feeling all that confident on their overall operation.
The problem, I think, is one of user expectation. And as a user of this control panel, it doesn't meet my expectation of what the tool should do. Sure, it might be easier for some kid out of college with scant experience with Unix to get up and running, but when anything (and I mean anything) goes wrong, it's damn near impossible to clear up without resorting to a lower level approach, like … the command line! (Oooooooooooooh!) And the problem with dropping down to the command line is that often times, you break the control software, or that whatever you fix breaks again when the control software asserts its control over the system and resets the configurations back to how it thinks they should be.
For instance, up until late last week we've been having an IP address conflict—two servers responding to the same IP address. This is not a good thing. Late last week I was finally able to track down which two machines were fighting over the IP address (and to do this, I had to log into the managed switches, check the ARP cache to see which port each machine was plugged into (found out that a third machine was programmed with the same IP address), then track down the two machines in question—no nice graphical user interface for this, no siree). I finally pinned down which machine was supposed to have the IP (this was a case of a website being moved from one server to another then yet to a third). Removed the site (using the control panels) from the two machines that the site shouldn't be on (figuring that the control panel would do its magic to remove everything it should and not break more stuff) and forced the network switch to clear its ARP cache for that IP address.
Then all was right with the world, and the network, and the site. And lo, the customer was happy because now they could get to their website.
Until today.
And lo, the old server said that it has that IP address, and lo, the switch obeyed and started sending network traffic unto it, even though that server servith not under that IP address. And lo did the customer complaineth.
Book of ARP, Chapter 2, verse 17
Yup, the switch was sending traffic for the IP address in question to the wrong server. I log into
that server, A, and run ifconfig
and well … the IP address does not show up. I log onto
the server that is supposed to have that IP address, run ifconfig
on it, and well
… the IP address does not
show up there either!
I log into a third machine, ping the IP address, then check to see which machine is getting the traffic—it's the old server.
I scour up and down the old server, searching every file under
/etc
(where all the configuraiton files live) and not finding
the IP address anywhere. I double check through the control panel
that the IP address is not supposed to be there, and nope,
according to the control panel, this server is not supposed to be listening
in on that IP address.
Yet it is.
It was suggested that I reboot the old server and see if that fixes the problem.
It does.
The correct server now responds to the IP address.
This is not a Windows box! You're not supposed to fix problems by randomly rebooting the server!
Yet I scour through that server, looking for the IP address and
not finding it. I can't figure out how the server was told to listen to
that IP address, because ifconfig
isn't showing any network
interface with that address. But nonetheless, it is.
[I found out, through some research, that this
control panel uses the ip command, not
ifconfig
like every other Unix system I've used. Even
if this is a better program for what the control panel is trying to do, why,
oh why, is it incompatible with ifconfig
? What is up with
that?]
Perhaps I'm asking too much of a control panel that can manage to deal with manual tweaks of the system from time to time? Or perhaps I'm just bitter that I have to learn yet another system for the umteenth time, and one that can't even do a simple thing like change an IP address of a website (and funny, but I don't have nearly this level of problem with the other servers I admin—the ones that don't have a control panel installed).