The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Monday, January 14, 2013

The significance of this is that you can build parsing expressions on the fly …

I found Meta II to be an interesting approach to parsing, and the closest modern equivilent to that are parsing expression grammars (PEGs), and the easiest one to use I've found is the Lua implementation LPeg.

What's interesting about LPeg is that it isn't compiled into Lua, but into a specialized parsing VM, which makes it quite fast. Maybe not as fast as lex and yacc but certain easier to understand and vastly easier to use.

Let me amend that: I find the re module to be easier to use (which is build on LPeg), as I find this:

local re = require "re"

parser = re.compile [[
	expr		<- term (termop term)*
	term		<- factor (factorop factor)*
	factor		<- number
			/  open expr close

	number		<- space '-'? [0-9]+ space	
	termop		<- space [+-] space
	factorop	<- space [*/] space
	open		<- space '(' space
	close		<- space ')' space
	space		<- ' '?
]]

to be way easier to read and understand than

local lpeg = require "lpeg"

local space    = lpeg.P" "^0
local close    = space * lpeg.P")" * space
local open     = space * lpeg.P"(" * space
local factorop = space * lpeg.S"*/" * space
local termop   = space * lpeg.S"+-" * space
local number   = space * lpeg.P"-"^-1 * lpeg.R"09"^1 * space

local factor , term , expr = lpeg.V"factor" , lpeg.V"term" , lpeg.V"expr"

parser = lpeg.P {
  "expr",
  factor = number
         + open * expr * close,
  term   = factor * (factorop * factor)^0,
  expr   = term   * (termop   * term)^0
}

As such, I've been concentrating on using the re module to brush up on my parsing skills to the point that I've been ignoring a key compent of LPeg—expressions!

Sure, raw LPeg isn't pretty, but as you can see from the above example, it is built up out of expressions. And that's a powerful abstraction right there.

For instance, in mod_blog, I have code that will parse text, converting certain sequences of characters like --- (three dashes) into an HTML entity &mcode;. So, I type the following:


``The name of our act is---The Aristocrats! ... Um ... hello?''

which is turned into

&ldquo;The name of our act is&mdash;The Aristocrats! &hellip; Um &hellip;
hello?&rdquo;

to be rendered on your screen as:

“The name of our act is—The Aristocrats! … Um … hello?”

Now, I only support a few character sequences (six) and that takes 160 lines of C code. Adding support for more is a daunting task, and one that I've been reluctant to take on. But in LPeg, the code looks like:

local lpeg  = require "lpeg"

local base =
{
  [ [[``]] ] = "&ldquo;" ,
  [ [['']] ] = "&rdquo;" ,
  [ "---"  ] = "&mdash;" ,
  [ "--"   ] = "&ndash;" ,
  [ "..."  ] = "&hellip;",
  [ ".."   ] = "&#8229;" ,
}

function mktranslate(tab)
  local tab   = tab or {}
  local chars = lpeg.C(lpeg.P(1))
  
  for target,replacement in pairs(tab) do
    chars = lpeg.P(target) / replacement + chars
  end

  for target,replacement in pairs(base) do
    chars = lpeg.P(target) / replacement + chars
  end
  
  return lpeg.Ct(chars^0) / function(c) return table.concat(c) end
end

Now, I could do this with the re module:

local re   = require "re"
local R    = { concat = table.concat }
local G    = --[[ lpeg/re ]] [[

text    <- chars* -> {} -> concat

chars   <- '`'   -> '&ldquo;'
        /  "''"   -> '&rdquo;'
        /  '---'  -> '&mdash;'
        /  '--'   -> '&ndash;'
        /  '...'  -> '&helip;'
        /  '..'   -> '&#8229;'
        /  { . }

]]

filter = re.compile(G,R)

But the former allows me to pass in an additional table of translations to do in addition to the “standard set” programmed in, for example:

translate = mktranslate {
  ["RAM"]  = '<acronym title="Random Access Memory">RAM</acronym>',
  ["CPU"]  = '<acronym title="Central Processing Unit">CPU</acronym>',
  ["(tm)"] = '&trade;'
}

And I would want this why? Well, I have Lua embedded in mod_blog, so using Lua to do the translations is straightforward. But, now when I make an entry, I could include a table of custom translations for that entry. Doing it this way solves a problem I saw nearly a decade ago.

Obligatory Picture

[Don't hate me for my sock monkey headphones.]

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

http://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2014 by Sean Conner. All Rights Reserved.

Listed on BlogShares