Monday, March 14, 2005
We're drowning in emails!
I found out why the server had an abundance of emails today—the primary resolving nameserver crashed.
There are four name servers running at The Company—two for people outside our network to look up domains we host, and two internally just for resolving DNS queries inside The Office. And one of the resolving name servers crashed hard. Totally offline.
And that threw everything out of whack.
I'm not sure why a name server (which isn't a single point of failure) crashing should throw the email system out of whack, but when I did get that server up and running (no idea yet as to why it crashed) things seemed to return to normal.
Odd, yes.
The email server in question should have queried the name server
still running. And had we been running a stock installed system I might
have actually attempted to figure out the root cause. But we run
Insipid
on that email server, with a (possible) custom
installation of sendmail (configured (and possibly
patched) to run under the control panel).
And I'm not about to debug that (Smirk isn't paying me that much money).
And that's not counting our spam firewall (which is a separate system), which also seemed to have problems when the primary resolving name server crashed. And I'm not about to debug why that didn't bother to use the secondary name server either.
The upshot was, a bunch of mail simply queued up (thousands—-we're talking thousands of email) because of domain resolution issues.
And that caused our customers to bitch complain that they
couldn't get their email.
Now, what I can debug is why that particular server crashed. I may have to disable any hardware screen blankers, since when I plugged in a crash cart nothing was visible on the screen, and it was D-E-A-D to the point where the keyboard wouldn't register (which normally unblanks the screen). There wasn't anything obvious in the system logs, but I didn't have much of a chance to pour through them though.