Tuesday, February 09, 2010
Syslogintr—a syslogd replacement in C and Lua
So, why did I decide to write my own syslogd
? Well, it was
started as something to do while at Bunny's mom's house. I was curious as
to the actual protocol used by syslogd
because it's always
bugged me that the files written by syslogd
bear no information
about the facility or priority a message originally had. Sure, you can
filter by the facility and priority, but the resulting files lose
that information.
It was after reading RFC-3164 that I realized it might be possible to filter on more than just the facility and priority. Have Lua handle the logic of log analyzation and I had what I thought was a fun little project.
Heh. Little did I know.
Anyway, it's been in use for a little over three months (and 140 commits
to the source repository) and I must say, I'm finding it very
useful. Sure, I could have written a Lua script to parse the log files that
the traditional syslogd
writes, but I'm just cutting out the
middleman filesystem.
So the C code accepts, parses and hands a well-structured block of information to the Lua code, and it's the Lua code where all the features are implemented.
The information to Lua is passed in a table (which is Lua for an associative array, or hash table) with the following fields:
name | type | comments |
---|---|---|
name | type | comments |
version | number | This is the version of the syslog protocol, of which there are two—the documented version 1 which I haven't seen in the wild yet, and the conventional version which I've internally labelled as “version 0”. I currently only support version 0, so this field (as of this writing) is always 0. |
_RAW | string | This is the actual raw message as received. It was intended as a debugging aid, but I keep it enabled as it is useful when you get the odd message. |
remote | boolean | true if the message
came in over the network, false if it was generated
locally on the machine. I use this as a quick test. |
host | string | This is really the source of the
message. For a remote syslog message, this will contain the
IPv4 or
IPv6 address.
For messages generated locally, this will be the “local socket”
(formerly known as “Unix sockets”) the message was written to
(which on most systems is /dev/log ). |
relay | string | Usually this will be the same as
the host field, but when syslog messages are being
relayed, this is the intermediate system that relayed the message
on. If system A generated a message to B, and B relayed it to C,
then the host field will contain “A”, but the
relay field will contain “B”. If, however, it goes A
→ B → C → D, then host will contain
“A” and relay will have “C” (in other words—the
relay field will only contain the last relay
machine). |
port | number | The UDP port number the syslog message was sent from. If the message was sent locally, this will be -1. |
timestamp | number | This is the timestamp the
message as it was received on the accepting host. This value is
suitable for use in the Lua functions os.date() and
os.difftime() . |
logtimestamp | number | This is the timestamp as
found in the syslog message (if there was one). Otherwise, it will
be the same as timestamp . |
program | string | This is really the tag portion,
but since it's mostly the name of the program that generated the
syslog message, that's what I called this field. If it wasn't
given, the value of this field will be “”' (an empty string, not
nil ). |
pid | number | The process ID of the program that
sent the message. Most Unix syslogd implementations
have this as part of the tag, and if found, is set here. Otherwise,
it will be 0. |
facility | string | The facility, which will be one
of the following:
|
level | string | This is the priority of the
message, but the term “priority” never made much sense, so I call
it “level.” This will be one of eight values:
|
msg | string | The actual message, or basically, whatever is left after parsing everything else. This is pretty much a free-format string. |
This table is passed to a function called “log()” to handle however. On my development system, this function is simply:
function log(msg) writelog(msg) sshd(msg) end
writelog()
just logs the message to a single huge logfile
(that records the host
, program
,
facility
, level
and msg
fields).
It's the sshd()
function that's interesting:
if blocked == nil then blocked = {} end function sshd(msg) if msg.remote == true then return end if msg.program ~= "sshd" then return end if msg.facility ~= "auth2" then return end if msg.level ~= "info" then return end local ip = string.match(msg.msg,"^Failed password for .* from ::ffff:([%d%.]+) .*"); if ip == nil then return end I_log("debug","Found IP:" .. ip) if blocked[ip] == nil then blocked[ip] = 1 else blocked[ip] = blocked[ip] + 1 end if blocked[ip] == 5 then local cmd = "iptables --table filter --append INPUT --source " .. ip .. " --proto tcp --dport 22 --jump REJECT" I_log("debug","Command to block: " .. cmd) os.execute(cmd) I_log("info","Blocked " .. ip .. " from SSH") table.insert(blocked,{ ip = ip , when = msg.timestamp} ) end end
This checks for local messages from sshd
and if it finds
five (or more) consecutive failed attempts, adds the IP address of the
offending party to the firewall, and logs the attempt (via
I_log()
). Nothing that a lot of intrusion scripts don't
already do, but this is in the syslog daemon, not another process reading
this second hand through a file (and without the issues that come up with
log rotation).
Now, in order to keep the firewall from filling up, I added yet another feature—the ability to periodically run a Lua function. So, elsewhere in the script I have:
alarm("60m") function alarm_handler() I_log("debug","Alarm clock"); if #blocked == 0 then I_log("debug","Alarm clock---snooze button!") return end local now = os.time() I_log("debug",string.format("About to remove blocks (%d left)",#blocked)) while #blocked > 0 do if now - blocked[1].when < 3600 then return end local ip = blocked[1].ip I_log("info","Removing IP block: " .. ip) blocked[ip] = nil table.remove(blocked,1) os.execute("iptables --table filter -D INPUT 1") end end
alarm()
informs the C code to call the function
alarm_handler()
every 60 minutes (if you give
alarm()
a numeric value, it takes that as a number of seconds
between calling the function, if a string, it expects a numeric value
followed immediately by “m” for minutes, “h” for hours (so I could have
specified “1h” in this case) and “d” for days). The function
alarm_handler()
then removes the blocks one at a time (once per
hour) which is long enough for the scanner to move on.
Now, you may have noticed this bit of odd code:
if blocked == nil then blocked = {} end
One feature of the C code is that if it receives a SIGUSR1
,
the Lua script will be reloaded. This is to ensure that if there is any
IP blocks defined, I don't lose
any when the script is reloaded (and yes, that is a handy feature—change
the Lua code a bit, and have the daemon reload the script without having to
restart the entire daemon).
Along those lines, if the C code receives a SIGHUP
, calls
the Lua function reload_signal()
, which allows the script the
ability to close and reopen any logfiles it might be using (thus being
compatible with the current syslogd
behavior) which is useful
for rotating the logs and what not.
I'm also running this on my own server (brevard.conman.org
,
which runs this blog) and also one of our monitoring servers at The Company
(this one not only runs Nagios, Cacti and
snmptrapd
, but all our routers send their syslog information to
this server). The Lua code I have running on the monitoring server not only
keeps logging in the same files as the old syslogd
(in the same
format as well), but also this little bit of code:
function check_ospf(msg) if msg.facility ~= 'local1' then return end if string.match(msg.msg,".*(OSPF%-5%-ADJCHG.*Neighbor Down).*") then send_emergency_email("sean@conman.org",msg) send_emergency_email("XXXXXXXXXXXXXXXXX",msg) elseif string.match(msg.msg,".*(OSPF%-5%-ADJCHG.*LOADING to FULL).*") then send_okay_email("sean@conman.org",msg) send_okay_email("XXXXXXXXXXXXXXXXX",msg) end end
If the routing on our network changes, I get email notification of the
event. I also have some code that processes the logs from Postfix. Postfix
generates “thin
entries,” five entries per email (from various subsystems) when what I
really want logged are “fat entries” (which summarize the status of the
email in one log line). So, I wrote some code to catch all five lines per
email, and then just log a one line summary, thus changing a bunch of “thin
entries” to one “fat entry” (what I really want to do is send all mail
related logs on all our servers to one central location, so we no longer
have to check two or three servers everytime one of our customers complains
about email being “too slow” but that's for another show
entry).
The Lua code on my personal server also does the Postfix “thin-to-fat” conversion, but it also logs the webserver status every hour (just because I can).
And if you're wondering why I've blathered on and on about a piece of code I've yet to release, it's because I've yet to write any documentation on the darned thing, and I figure this would be a decent first pass at some documentation. And maybe to gauge interest in the project.