The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Wednesday, October 14, 2009

Lua's long strings

I'm back from mucking with R's project for now …

One of the things I had to do on “Project: DoogieHowser” was figure out which tables needed updating before the questionnaire could start, in an attempt to get the thing as standalone as possible (since as written, it's almost, but not quite separate from the old undocumented hugely overwrought PHP framework) and one of the ways to test my findings was to populate the database and try it out.

And in order to test it over and over again, I wrote the code in PHP (this was just before I start learning Lua):

$now = time();
$id  = mysql_insert_id($conn);

$query = "INSERT INTO program ("
              . "order_number,"
              . "orderid,"
              . "clientid,"
              . "editable,"
              . "purchasedate,"
              . "reactive_date"
         . ") "
         . "VALUES ("
              . "'119195450271',"
              . "25278,"
              . $id . ","
              . "1,"
              . $now . ","
              . $now . ")";

$result = mysql_query($query,$conn);
$pid    = mysql_insert_id($conn);

… and what can I say, it's PHP. It's messy. It's nasty. It's got a pediliction for punctuation.

Now, one of the neatest features of Lua is the use of “long brackets”, which make the construction of huge strings trivial:

a_long_string = [[
`Twas brillig, and the slithy toves
  Did gyre and gimble in the wabe:
All mimsy were the borogoves,
  And the mome raths outgrabe.


"Beware the Jabberwock, my son!
  The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
  The frumious Bandersnatch!"
]]

It's a brilliant solution to such a problem (and even neater, there's a method for embedding such long strings within a long string). It even extends to comments. In Lua, the line comment starts with a --:

-- frob the widget a few times, just to make sure
for i=1,10 do 
  frob(widget)
end

But a longer block comment starts with --[[ and ends with ]]:

--[[

We need to frob the widget multiple times, since the bitbucket sometimes get
stuck and needs to be forced a few times to clear up the issue.  Through
empirical testing we found that frobbing the widget 10 times will always
work, doesn't effect performance *all* that much, and much more importantly,
will allow me to get at least two hours sleep under the desk before da boss
comes in and starts screaming about deadlines ... 

]]

for i=1,10 do
  frob(widget)
end

So, had I used Lua, the above code would probably look something like:

now   = os.time()
id    = mysql.insert_id(conn)

query = string.format([[
	INSERT INTO program 
	(
		order_number,
		orderid,
		clientid,
		editable,
		purchasedate,
		reactive_date
	)
	VALUES 
	(
		'119195450271',
		25278,
		%s,
		1,
		%d,
		%d
	)]],id,now,now)

res = mysql.query(query,conn)
pid = mysql.insert_id(conn)

Not much different, but I didn't have to build up that huge string piecemeal, and it was certainly easier to type up.

And now that I think about it, I don't think it would be all that much code to fill in an SQL template with variables:

function sql_template(sql,text,vars)
  local function cmd(tag)
    local word = string.sub(tag,2,-2)
    
    if type(vars[word] == "string" then
      return sql.escape(vars[word])
    elseif type(vars[word] == "function" then
      return sql.escape(vars[word]())
    else
      return sql.escape(tostring(var[word]))
    end
  end

  local s = string.gsub(text,"{%w+}",cmd)
  return s
end

query = sql_template(mysql,[[
	INSERT INTO program 
	(
		order_number,
		orderid,
		clientid,
		editable,
		purchasedate,
		reactive_date
	)
	VALUES 
	(
		'119195450271',
		25278,
		{id}
		1,
		{now},
		{now}

	)]] , { id = mysql.insert_id(conn), now = os.time() });

Hmmm … I might have to see if I can do something like that in PHP; it would certainly help in “Project: Leaflet” …


A day in the life of the Googlenet

The following ticket comes in:

The warning message that my web site might be an attack site is showing up again. … The fact that I can get around the warning and get to the site doesn't alter the fact that it comes up sometimes instead of my home page. Business is bad enough without having customers scared away. It wasn't there yesterday. It is there again today.

This, and the phone conversation, lead me to believe that when he opens his browser, it immediately loads up the Google home page, and that he types his domain into the search box to get to his own site.

I say that because when I went to Google, and typed in his domain:

XXXX XXXXXX

This site may harm your computer.
Merchants, if you are interested in selling your wares in this market, call XXXXXXXXXXXX between 9-5 EST, Mon-Fri for information …
XXXXXXXXXXXXXXXXXXX - Similar

That, and I doubt he really understands how all this Intarweb stuff works, because that only happened on one of his three browsers.

Heck, for all I know, it could be the anti-virus software on the one computer gets some clues from Google (I'm not even sure how all this stuff works anymore, especially on the Microsoft Windows front).

Of course, his site uses quite a large amount of PHP … which meant I had to clean out the malicious PHP code that had been added …

I'm really of two minds about PHP. Personally, I hate the language (being designed by rabid wombats will do that for a language) and if I have a choice, I'd rather use C to write web applications than PHP (or maybe Lua). But I'll admit up front I'm a computer language snob who prefers early binding and static typing (comes from my Assembly Language background I'm sure), because PHP has allowed people who would otherwise be unable to create web applications not only the ability to create web applications, but find inexpensive hosting for said applications. Then again, it has allowed people who should otherwise be prevented from creating web applications the ability to create web applications more insecure than a balsa wood bank vault.

I think I'm digressing (and there's more I could say about insecure web applications, but that's for another post), but I'm not sure what point I want to make, but I think it has something to do with Google taking over the Internet. That, and the fact that because of the way they scale, they attempt to automate as much of the company as possible, and as a consequence, they have horrible customer service (and no, the irony of using Google to search for evidence of its own bad customer service isn't lost on me, nor the fact that I'm probably using the term “irony” incorrectly), so our customer will probably face an uphill battle to get “This site may harm your computer” off his Google search results.


A mass of stupid benchmarks. You have been warned.

A few weeks ago Wlofie and I were talking about computer languages and I mentioned the language I wrote in college (it was more than the simple DSL shown there), so I cleaned up the code so it would compile (most of it was written in 1993; last date of modification was March 5th, 1995) and just because I was curious, I decided to run a stupid benchmark on it.

What's more stupid than just adding up the first 1,000,000 integers?

: main 
  0 0 
  begin 
    swap over + swap 1 + 
    dup 1000000 < while 
  repeat 
  drop "%d#n" string print ;
  
main 0 exit

(Oh, did I mention the language was based off Forth?)

Running that code on my modern 2.6GHz machine takes a full 2.1 seconds (and that's after I removed a huge bottle neck in the runtime engine). The equivalent C code:

#include <stdio.h>
#include <stdlib.h>

int main(int argc,char *argv[])
{
  int i;
  int total;
  
  for (i = 0 , total = 0 ; i < 1000000 ; i++)
    total += i;
  
  printf("%d\n",total);
  return EXIT_SUCCESS;
}

takes a mere 0.004 seconds to run.

Okay, that's expected—it's C. But what about Perl? 0.47 seconds.

Heh. A hundred times slower.

Hey! What about Python? It's not as fast as Perl, but it comes in at around 0.58 seconds.

I'm not even going to attempt a comparrison between my language and Ruby—it'd be too depressing to lose out to a known slow language.

But Wlofie did mention one other language he was playing around with—Lua. Okay, how fast is Lua? Same benchmark as above: 0.15 seconds.

Wow.

It blew Perl right out of the water.

This required a bit more investigation. And a better test than just summing up the first 1,000,000 integers. I settled upon the Jumble program. It involves I/O and has one non-trivial operation (sorting the letters in a word). And on my current system, the C version can check 483,523 words in 0.18 seconds (the timings I gave back in 2008 were for my older, 120MHz development machine).

The Lua code is straightforward:

#!/usr/local/bin/lua

WORDS = "/usr/share/dict/words"

-- ******************************************************

function sortword(word)
  local t = {}
  for i=1,string.len(word) do t[i] = string.byte(word,i) end
  table.sort(t)
  return string.char(unpack(t))
end

-- ********************************************************

if #arg < 1 then
  io.stderr:write("usage: %s word\n",arg[0])
  os.exit(1)
end

word = sortword(string.upper(arg[1]))

dict = io.open(WORDS,"r")
if dict == nil then
  io.stderr:write("Houston, we have a problem ... \n")
  os.exit(1)
end

for line in dict:lines() do
  dictsort = sortword(string.upper(line))
  if dictsort == word then
    print(line)
  end
end

dict:close()
os.exit(0)

sortword() has quite a bit of work to do—it has to go through the string, placing each letter into an array, sort the array, and then convert the array back into a string. Unlike C, which treats strings as an array of characters (and thus can be sorted without the data gymnastics), Lua (and pretty much all other modern scripting languages) treat strings at a single unit.

So it takes 7.9 seconds for Lua to run through the same 483,523 words.

Ouch. And I think it's pretty obvious where the time is being spent—in sortword(), pulling apart strings, sorting arrays, making new strings. But Lua can be easily extended using C. I rewrite sortword() in C:

#include <stdlib.h>
#include <string.h>

#include <lua.h>
#include <lauxlib.h>
#include <lualib.h>

/*******************************************/

static int cmp(const void *l,const void *r)
{
  int cl;
  int cr;
  
  cl = *(const char *)l;
  cr = *(const char *)r;
  
  return cl - cr;
}

/************************************/

static int sortword(lua_State *L)
{
  const char *word;
  size_t      len;
  
  word = luaL_checkstring(L,1);
  len = strlen(word);
  
  char buffer[len];
  memcpy(buffer,word,len);
  buffer[len] = '\0';
  qsort(buffer,len,1,cmp);
  lua_pushlstring(L,buffer,len);
  return 1;
}

/***********************************/

int luaopen_sortword(lua_State *L)
{
  lua_register(L,"sortword",sortword);
  return 0;
}

And now it only takes 2.2 seconds. Not a bad return upon investment.

Now we turn to Perl:

#!/usr/bin/perl
use strict;

#***************************

sub sortword
{
  my $str = shift;
  my @t   = unpack("(C)*",$str);
  
  @t = sort(@t);
  return pack("(C)*",@t);
}

#*******************************

if (scalar(@ARGV) == 0)
{
  printf(STDERR "usage: %s word\n",$0);
  exit(1);
}

my $word = sortword(uc($ARGV[0]));

open(DICT,"/usr/share/dict/words") 
  or die("Houston, we have a problem ... \n");

while(my $line = <DICT>)
{
  chop $line;

  my $dictsort = sortword(uc($line));
  
  if ($dictsort eq $word)
  {
    print $line . "\n";
  }
}

close(DICT);
exit(0);

Again, we need to pull apart the string into an array, sort that, and recreate the string. And when we run this lovely bit of Perl code it only takes 10.5 seconds.

Wow.

And here I thought Perl was the text crunching Master of the Universe.

Guess not.

Well, the Perl version of sortword() could be tightened up a bit …

sub sortword
{
  return pack("(C)*",sort(unpack("(C)*",shift)));
}

That takes us down to 8.6 seconds—nearly two full seconds to shuffle data between variables. Perl isn't looking all that good. Well, there is the function overhead, and given that we got sortword() down to one line, maybe if we inline it we'll get an improvement.

And we do—it's not 7.2 seconds. We finally beat unoptimized Lua code, but now with a more cryptic Perl version.

Way to go, Perl!

But given that Lua was easy to extend with C, can we do the same with Perl? A bit of searching lead me to the Inline::C Perl Module. Ten minutes to download and install (much better than I expected) and if anything, it was easier to embed C in Perl than in Lua:

#!/usr/bin/perl
use Inline C;
use strict;

# same code as above, minus the sortword() subroutine

exit(0);

__END__
__C__

static int cmp(const void *l,const void *r)
{
  int cl;
  int cr;
  
  cl = *(const char *)l;
  cr = *(const char *)r;
  
  return cl - cr;
}

char *sortword(char *word)
{
  size_t len;

  len = strlen(word);
  qsort(word,len,1,cmp);
  return word;
}

And we're down to 2.1 seconds. Just a bit quicker than the Lua/C version.

Well, that's after the 4.1 second initial run where it dumped a bunch of directories and files (lovely that).

Lua appears to be a bit faster than Perl, or at the very least, as fast as Perl.

And everything (even, quite possibly, Ruby) is faster than my own homebrewed language … sigh.

Obligatory Picture

Trying to get into the festive mood this year

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

Obligatory AI Disclaimer

No AI was used in the making of this site, unless otherwise noted.

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: https://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

https://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2024 by Sean Conner. All Rights Reserved.