The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Thursday, August 17, 2006

Notes on a program

This entry is less a polished entry and more just notes on a project I've been given. If it seems somewhat random and hard to fathom, that's why.

Smirk is drowning in email. As such, he's looking for a filtering solution whereby he can run a job daily that scans his email (using IMAP) and shuffles email to different folders. The criteria is something like “I've read it and it's older than seven days, move to this folder. If it's unread and older than three days, move to this other folder. If I've read it and replied to it, move it to yet another folder.”

procmail won't really handle that, as it's meant more for initial delivery and filtering of email. He also rejected sieve as it apparently doesn't handle date parsing that well (or something like that). So he asked me if I could write such a program, preferrably using PHP since he knows that language (and since I equally hate Perl and PHP, it's six of one, half dozen the other, and I would prefer C, but that's me).

So, the design of the program. Given some input file describing the filtering to do on email:

account imap://alice:zahg34!@mail.example.net/
{
  mailbox INBOX
  {
    foreach message
    {
      if (header.subject =~ /[Vv][Ii1][Aa@][Gg][Rr][Aa]/)
	moveto Trash;
      if (status = REPLIED) moveto Replied;
      if (header.date ~ "3 days ago" && status = UNREAD)
        moveto Archive;
      if (header.date ~ "7 days ago" && status != UNREAD)
        moveto ReadArchive;
    }
  }

  mailbox Archive
  {
    if (messages > 5000)
      sendmail("Yo!  There are too many messages in the archive!");

    if ((messages > 3000) 
    || (message[1].header.date >~ "6 months ago"))
      sendmail("Yo!  Check your archive!");
  }
}

Okay, maybe nothing quite so grandiose, but some file to explain the rule sets for moving messages from one box to another, run as a job periodically (a cron job).

We need to retrieve information via IMAP. We need to parse the email headers. We'll need regular expressions, as well as date processing utilities (“3 days ago,” “less than 5 hours,” etc). We'll need to read and parse the rules file (using whatever syntax I come up with). Oh, I would like to translate all the text to some intermediary character set so we can filter consistently, which means using iconv (and parsing MIME specific headers and MIME-encoded headers).

So the main program flow for processing each message would look something like:

get headers for next message
convert to consistent character set (probably UTF-8)
for each rule to check again
	check conditions of rule against message
	if all conditions apply, apply action

The hardest parts appear to be getting a version of PHP with all the required exentions installed. Next would be defining the input file and parsing that into some internal format for processing. The rest pretty much just falls into place.

Most of the time will be spent in building the required version of PHP, and in playing with the various modules to figure out how they work and what exactly one gets. I would also need to set up a play IMAP account to test the program against (there's no way I want to run this on my email account, or on Smirk's for that matter).

Obligatory Picture

[It's the most wonderful time of the year!]

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

http://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2019 by Sean Conner. All Rights Reserved.