The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, April 27, 2005

Taking away the spam filter's Little Orphan Annie Secret Decoder Ring

A few months ago I wrote about some character encoding problems I was having, namely that it was a real mess under the web. But apparently, it's not a mess with email.

We have a dedicated computer that does nothing but filter spam (and the statistics from that are depressing); you can add additional fitering via regular expressions. Smirk has been receiving quite a bit of foreign spam, stuff in Russian, Korean, Chinese, which he can't even read since it's in Cyrillic, Wansung and Hangul. But (for instance) some (if not most) of the email had subject lines like:

Subject: =?Windows-1251?B?amFlQGxlZWhvbS5uZXQg?=

where the character set is encoded within the subject line. So Smirk thought a regular expression like ^Subject: .*Windows-1251.* would work and filter out the spam in Cyrillic (with appropriate regular expressions for Wansung and Hangul).

Only it didn't work.

It caught subject lines that had “Windows-1251” as part of the legitimate subject line (I sent him a test message with the subject of “Did you get Windows-1251 yet?”) but not if it was part of an encoding. Which meant only one thing: the spam filtering system was applying the regular expressions to the decoded characters!

Well … that's certainly a surprise.

But it doesn't help the current problem. We're now waiting to hear back from the company if that “feature” can be turned off.

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site:, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.