The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Monday, November 09, 2009

Perhaps an 80M script is a bit excessive …

Every so often I'll do a bit of work on an unimportant project, just to keep myself sane from working in PHP and Drupal.

About a month ago I decided to save the data from email indexer program as a Lua program, something like:

emails = 
{
  filelist = 
  {
    { 
      file = "/home/spc/Mail/sent" ,
      size = 902273,
      time = "Tue, 10 Nov 2009 09:35:10 GMT",
    },
    {
      file = "/home/spc/LINUS/Archive.mail/20060607/cctalk",
      size = 230140,
      time = "Tue, 02 May 2006 18:22:28 GMT",
    },
    -- and so on ... 
  },

  mbox = 
  {
    {
      info = { mboxfile = 1, oh={45, 322}, ob={368, 15}},
      ['Message-ID'] = "<20081021051331.GA30804@lucy.localdomain>",
      ['From'] = { "Sean Conner <sean@conman.org>",},
      ['To'] = { "sean@conman.org",},
      ['Subject'] = "This is a test",
      ['Date'] = "Tue, 21 Oct 2008 01:13:31 -0400",
      ['MIME-Version'] = "1.0",
      ['Content-Type'] = 
      { 
        "text/plain",
        "charset=us-ascii",
      },
      mimeheaders = 
      {
        ['Content-Disposition'] = "inline",
      },
      ['Lines'] =  1,
      extraheaders =  
      {
        ['User-Agent'] = "Mutt/1.4.1i",
        ['Status'] = "RO",
      },
    },
    -- and so on ... 
  }
}

That way, I could load it into the Lua interpreter and work with the data in Lua, instead of writing a bunch of C code. I debugged the output to make sure it was valid Lua and everything was fine.

Until I threw 80,919 messages from 2,360 email files I had lying around (going back to 1991). Then all I got from Lua was:

lua: constant table overflow

Hmmm … okay, maybe throwing a 80MB into the Lua interpreter wasn't such a good idea.

But then tonight I decided to give it one more try. The source code to Lua didn't reveal any immediate settings to tweak, so I did a bit of searching. And yes, I'm not the only one with that problem. Reading further, I learned that while there isn't a limit to the size a Lua table can get, there is a limit to the number of constants in a single Lua function.

But the code isn't in a Lua function.

Or is it?

It is. When you load Lua code from an external source, it gets compiled into an anonymous function that needs to be run. So, the solution is to break the initialization into several functions, and from some experimenting, I found that things would work (with this particular data set) if I only initialized 16,384 items per function.

But there's a difference between “it worked” and “this is a usable solution.”

Generating the Lua code? 30 seconds.

Loading the Lua code into the interpreter? Six minutes and an overheated CPU

Interesting …

Update Tuesday, November 10th, 2009

I managed to hit the worst case run-time with the code. Change the order of things, and it runs in about 15 seconds. Go figure …

Update Wednesday, February 3rd, 2010

It was a bug in Lua that has since been fixed.

Obligatory Picture

[The future's so bright, I gotta wear shades]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.