For the past two months or so I've been fighting a problem on one of our DNS resolvers—the server is still up, and from what I can tell, DNS is still up and running (it's one reason why I got
monnet working oddly enough) but the system load shoots up and consequently it gets slower and slower to respond.
At first I thought it was slowing down due to logging every little lame server out there, so I dropped the logging down to only severe errors, and that didn't help one bit.
I've been having to go in and restart
named every other day or so. I'm relunctant to set that up as a
cron job, as that only masks the issue; it doesn't solve it.
Meanwhile, one of our clients has been experiencing some network “issues” since we reworked their network (to be fair, they were having issues before we reworked their network, hence our reworking their network) and even more puzzling, we haven't seen anything abnormal from our monitoring their network (although some close scrutiny did reveal that the wireless shot there is still a bit flaky, even though we re-aimed the shot—now we've set the routing to prefer the T-1 over the wireless).
We then received email about possible DNS issues, and they're using our DNS resolvers.
A non-responding DNS could be mistaken for a slow network, or a network that appears to have outtages (as the DNS queries take time to time out). Our problematic resolver could be causing them to have problems.
In describing the issue with Wlofie, it came to light that the problematic server only has 32M of RAM.
On a rather busy network.
The other resolver, the one not having an issue, has 128M of RAM.
I think I found the problem here (and honestly, it never occured to me to check memory on the box—sigh).