Thursday, August 17, 2006
Hypertext editing, Part III
Nearly three years later and it still hasn't gotten any easier.
Yesterday at dinner Bunny and I had a conversation about the difficulty I have in writing these entries—mostly about the time it takes and how the writing process gets interrupted by the editing process of adding links and tags and acronym expansions and what not.
Like today, which has been “catch-up” day. Sunday's entry not only required the usual hand-made markup, but it required the transfer of the images to my laptop where I have the image manipuation program (my normal desktop Linux system chokes on the large images, and the Mac mini doesn't have any image manipulation programs at all) for the cropping and resizing, then sending the results to the web server.
In other words, the work flow for producing entries here sucks, which partially explains why I fall behind on the entries as much as I do.
Notes on a program
This entry is less a polished entry and more just notes on a project I've been given. If it seems somewhat random and hard to fathom, that's why.
Smirk is drowning in email. As such, he's looking for a filtering solution whereby he can run a job daily that scans his email (using IMAP) and shuffles email to different folders. The criteria is something like “I've read it and it's older than seven days, move to this folder. If it's unread and older than three days, move to this other folder. If I've read it and replied to it, move it to yet another folder.”
procmail
won't really handle
that, as it's meant more for initial delivery and filtering of email. He
also rejected sieve
as it apparently doesn't handle date parsing that well (or something like
that). So he asked me if I could write such a program, preferrably using
PHP since he knows that language (and since I equally hate Perl and PHP,
it's six of one, half dozen the other, and I would prefer C, but that's
me).
So, the design of the program. Given some input file describing the filtering to do on email:
account imap://alice:zahg34!@mail.example.net/ { mailbox INBOX { foreach message { if (header.subject =~ /[Vv][Ii1][Aa@][Gg][Rr][Aa]/) moveto Trash; if (status = REPLIED) moveto Replied; if (header.date ~ "3 days ago" && status = UNREAD) moveto Archive; if (header.date ~ "7 days ago" && status != UNREAD) moveto ReadArchive; } } mailbox Archive { if (messages > 5000) sendmail("Yo! There are too many messages in the archive!"); if ((messages > 3000) || (message[1].header.date >~ "6 months ago")) sendmail("Yo! Check your archive!"); } }
Okay, maybe nothing quite so grandiose, but some file to explain the rule
sets for moving messages from one box to another, run as a job periodically
(a cron
job).
We need to retrieve information via IMAP. We need to parse the email headers. We'll
need regular expressions, as well as date processing utilities (“3 days
ago,” “less than 5 hours,” etc). We'll need to read and parse the rules
file (using whatever syntax I come up with). Oh, I would like to
translate all the text to some intermediary character set so we can filter consistently, which
means using iconv
(and parsing MIME specific
headers and MIME-encoded headers).
So the main program flow for processing each message would look something like:
get headers for next message convert to consistent character set (probably UTF-8) for each rule to check again check conditions of rule against message if all conditions apply, apply action
The hardest parts appear to be getting a version of PHP with all the required exentions installed. Next would be defining the input file and parsing that into some internal format for processing. The rest pretty much just falls into place.
Most of the time will be spent in building the required version of PHP, and in playing with the various modules to figure out how they work and what exactly one gets. I would also need to set up a play IMAP account to test the program against (there's no way I want to run this on my email account, or on Smirk's for that matter).