Saturday, April 17, 2010
The monitoring of uninterruptable power supplies
I've been dealing with UPS problems for a week and a half now, and it's finally calmed down a bit. Bunny's UPS has been replaced, and I'm waiting for Smirk to order battery replacements for my UPS so in the mean time, I'm using a spare UPS from The Company.
Bunny suspects the power situation here at Chez Boca is due to some overgrown trees interfering with the power lines, causing momentary fluctuations in the power and basically playing hell with not only the UPSes but the DVRs as well. This past Wednesday was particuarly bad—the UPS would take a hit and drop power to my computers, and by the time I got up and running, I would take another hit (three times, all within half an hour). It got so bad I ended up climbing around underneath the desks rerunning power cables with the hope of keeping my computers powered for more than ten minutes.
It wasn't helping matters that I was fighting my syslogd
replacement during each reboot
(but that's another post).
So Smirk dropped off a replacement UPS, and had I just used the thing, yesterday might have
been better. But nooooooooooooooooo! I want to monitor the device
(because, hey, I can), but since it's not an APC, I can't use apcupsd
to
monitor it (Bunny's new UPS is an APC, and the one I have with the dead battery is
an APC). In searching for some software to monitor the Cyber
Power 1000AVR LCD UPS, I came across NUT,
which supports a whole
host of UPSes,
and it looks like it can support monitoring multiple UPSes on a single computer
(functionality that apcupsd
lacks).
It's nice, but it does have its quirks (and caused me to have nuclear meltdowns yesterday). I did question the need for five configuration files and its own user accounting system, but upon reflection, the user acccounting system is probably warranted (maybe), given that you can remotely command the UPSes to shutdown. And the configurations files aren't that complex; I just found them annoying. I also found the one process per UPS, plus two processes for monitoring, a bit excessive, but the authors of the program were following the Unix philosophy of small tools collectively working together. Okay, I can deal.
The one quirk that drove me towards nuclear meltdown was the inability of the USB “driver” (the program that actually queries the UPS over the USB bus) to work properly when a particular directive was present in the configuration file and running in “explore” mode (used to query the UPS for all its information). So I have the following in the UPS configuration file:
[apc1000] driver = usbhid-ups port = auto desc = "APC Back UPS XS 1000" vendorid = 051D
I try to run usbhid-ups
in explore mode, and it fails.
Comment out the vendorid
, but add it to the commnd
line, and it works. But without the vendorid
, the
usbhid-ups
program wouldn't function normally (it's the
interface between the monitoring processes and the UPS).
It's bad enough that you can only use the explore mode when the rest of the UPS monitoring software isn't running, but this? It took me about three hours to figure out what was (or wasn't) going on.
Then there was the patch I made to keep NUT
from logging
every second to syslogd
(I changed one line from “if result >
0 return else log error” to “if result >= 0 return else log error” since
0 isn't an error code), then I found this
bug report on the mailing list archive, and yes, that bug was affecting
me as well; after I applied the patch, I was able to get more informtion from the Cyber Power
UPS (and it didn't
affect the monitoring of the APC).
And their logging program, upslog
, doesn't log to
syslogd
. It's not even an option. I could however, have it
output to stdout
and pipe that into logger
, but
that's an additional four processes (two per UPS) just to log some stats into
syslogd
. Fortunately, the protocol used to communicate with
the UPS monitoring
software is well documented and easy to implement, so it was an easy thing
to write a script (Lua, of course) to query the information I wanted to log
to syslogd
and run that every five minutes via
cron
.
Now, the information you get is impressive. apcupsd
gives
out rather terse information like (from Bunny's system, which is still
running apcupsd
):
APC : 001,038,0997 DATE : Sat Apr 17 22:23:25 EDT 2010 HOSTNAME : bunny-desktop VERSION : 3.14.6 (16 May 2009) debian UPSNAME : apc-xs900 CABLE : USB Cable MODEL : Back-UPS XS 900 UPSMODE : Stand Alone STARTTIME: Thu Apr 08 23:20:10 EDT 2010 STATUS : ONLINE LINEV : 118.0 Volts LOADPCT : 16.0 Percent Load Capacity BCHARGE : 084.0 Percent TIMELEFT : 48.4 Minutes MBATTCHG : 5 Percent MINTIMEL : 3 Minutes MAXTIME : 0 Seconds SENSE : Low LOTRANS : 078.0 Volts HITRANS : 142.0 Volts ALARMDEL : Always BATTV : 25.9 Volts LASTXFER : Unacceptable line voltage changes NUMXFERS : 6 XONBATT : Fri Apr 16 00:40:37 EDT 2010 TONBATT : 0 seconds CUMONBATT: 11 seconds XOFFBATT : Fri Apr 16 00:40:39 EDT 2010 SELFTEST : NO STATFLAG : 0x07000008 Status Flag MANDATE : 2007-07-03 SERIALNO : JB0727006727 BATTDATE : 2143-00-36 NOMINV : 120 Volts NOMBATTV : 24.0 Volts NOMPOWER : 540 Watts FIRMWARE : 830.E6 .D USB FW:E6 APCMODEL : Back-UPS XS 900 END APC : Sat Apr 17 22:24:00 EDT 2010
NUT
will give back:
battery.charge: 42 battery.charge.low: 10 battery.charge.warning: 50 battery.date: 2001/09/25 battery.mfr.date: 2003/02/18 battery.runtime: 3330 battery.runtime.low: 120 battery.type: PbAc battery.voltage: 24.8 battery.voltage.nominal: 24.0 device.mfr: American Power Conversion device.model: Back-UPS RS 1000 device.serial: JB0307050741 device.type: ups driver.name: usbhid-ups driver.parameter.pollfreq: 30 driver.parameter.pollinterval: 2 driver.parameter.port: auto driver.parameter.vendorid: 051D driver.version: 2.4.3 driver.version.data: APC HID 0.95 driver.version.internal: 0.34 input.sensitivity: high input.transfer.high: 138 input.transfer.low: 97 input.transfer.reason: input voltage out of range input.voltage: 121.0 input.voltage.nominal: 120 ups.beeper.status: disabled ups.delay.shutdown: 20 ups.firmware: 7.g3 .D ups.firmware.aux: g3 ups.load: 2 ups.mfr: American Power Conversion ups.mfr.date: 2003/02/18 ups.model: Back-UPS RS 1000 ups.productid: 0002 ups.serial: JB0307050741 ups.status: OL CHRG ups.test.result: No test initiated ups.timer.reboot: 0 ups.timer.shutdown: -1 ups.vendorid: 051d
Same information, but better variable names, plus you can query for any number of variables. Not all UPSes support all variables, though (and there are plenty more variables that my UPSes don't support, like temperature). You can also send commands to the UPS (for instance, I was able to shut off the beeper on the failing APC) using this software.
So yes, it's nice, but its quirky nature was something I wasn't expecting after a week of electric musical chairs.