The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Sunday, February 07, 2010

Insanity

From
Mark Grosberg <XXXXXXXXXXXXXXXXX>
To
Sean Conner <sean@conman.org>
Subject
Re: Password updated
Date
Tue, 5 Jan 2010 11:24:10 -0500 (EST)

On Tue, 5 Jan 2010, Sean Conner wrote:

What's the cookie2 header for?

I'm so glad you asked. This is almost so good it may cause you to blog about it (actually I figured by the time we were done discussing the insanity of cookies you may have had an insightful blog post anyhow).

I guess after the third cookie spec they figured they kinda sucked at this so they built in an escape. So after much re-re-re-reading of the RFC I think what happens is if you have received a cookie with a $Version that you don't understand you are supposed to just send back a Set-Cookie2: header with $Version="version_this_thing_understands"

It's for future expandability so when we have 10 cookies specs clients and servers will “just work” (at this point I think we both know that statement is about as truthful as “the check is in the mail.”).

Well Mark (and yes, I know, it's been a month), the cookie specs are a paragon of clarity compared to the laughable mess that is syslog protocol specification. Had I been aware of the “informational nature” of RFC-3164 I might not have even started my own homebrew syslogd replacement (network stuff in C, high level logic in Lua).

How loose is the spec?

A program that wishes to use syslog() may select a “facility” the message will be logged under—think of “facility” as a subsystem, like “mail” or “cron” (under Unix, cron runs scheduled tasks on a periodic nature) or “auth” (authorization, or login credentials). Also, each message has a priority (kind of), one of “debug”, “info”, “notice”, “warn”, “err”, “crit” (for critical errors), “alert” (even more critical errors) and “emerg” (basically, the machine is on fire, abandon all hope, etc.). The program using syslog() can also tag each message, usually with its name, and the message itself has no real structure, originally being meant for human consumption.

Now, the syslog protocol, which is used to send the messages to a program that handles these messages, usually named syslogd under Unix, is a text based protocol, and a full RFC-3164 message would look something like:

<87>Feb 07 04:30:00 brevard crond: (root) CMD (run-parts /etc/cron.hourly)

You have the facility and priority (as a single number) in angle brackets, immediately followed by the timestamp, a space and then the name of the machine sending the message, a space and the tag (usually the name of the program on the machine sending the message), a colon, then the message.

And technically, every field is optional! Which makes parsing this a technical challenge. Not only that, but since there never really was a spec, it's easy to find ambiguous messages, such as:

<14>Jan 14 05:53:37 gconfd (spc-25469): Received signal 15, shutting down cleanly

which (per the spec) was sent from the program “(spc-25469)” on machine “gconfd”. Funny thing is, I have no machine called “gconfd” but there does exist a program called gconfd that runs on my machine, running as me, with a process ID of 25496 (fancy that).

I don't even want to talk about /Applications/Windows Media Player/Windows Media Player.app/Contents/MacOS/WindowsMediaPlayer.

It gets even worse. RFC-3164 makes a point in saying that the following is a legal syslog message that has to be processed:

Use the BFG!

Just writing the code to parse this mess took the majority of time, as I kept coming across syslog messages that really weren't.

To work my way out of this mess, if I don't find a proper facility/priority field, I log the raw message (using facility “user” and level “notice”, which is what RFC 3164 says to use in the absence of such information). If there's no timestamp, okay, but if there is one but it's malformed, I log the raw message. I then check for an IP or IPv6 address, as I feel that's really the only sane value to use, then everything else up to a ':' is accepted as the tag value (more or less).

Is it perfect?

No.

But so far, it covers everything I've personally encountered. It will misparse (which is a testcase I pulled from rsyslogd), but not crash, on seeing:

<130> [ERROR] host.example.net 2008-09-23 11-40-22 PST iapp_socket_task.c 399: iappSocketTask: iappRecvPkt returned error

Garbage in, garbage out (also, stuff like this can be checked in the Lua code, as the raw message is available in addition to the parsed message).

Cookies? Insane? Not really. Not when compared to the syslog protocol.

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.