Monday, January 14, 2013
The significance of this is that you can build parsing expressions on the fly …
I found Meta II to be an interesting approach to parsing, and the closest modern equivilent to that are parsing expression grammars (PEGs), and the easiest one to use I've found is the Lua implementation LPeg.
What's interesting about LPeg is that it isn't compiled into Lua, but into a specialized parsing VM, which makes it quite fast. Maybe not as fast as lex and yacc but certain easier to understand and vastly easier to use.
Let me amend that: I find the re module
to be easier to use (which is build on LPeg), as I find this:
local re = require "re"
parser = re.compile [[
expr <- term (termop term)*
term <- factor (factorop factor)*
factor <- number
/ open expr close
number <- space '-'? [0-9]+ space
termop <- space [+-] space
factorop <- space [*/] space
open <- space '(' space
close <- space ')' space
space <- ' '?
]]
to be way easier to read and understand than
local lpeg = require "lpeg"
local space = lpeg.P" "^0
local close = space * lpeg.P")" * space
local open = space * lpeg.P"(" * space
local factorop = space * lpeg.S"*/" * space
local termop = space * lpeg.S"+-" * space
local number = space * lpeg.P"-"^-1 * lpeg.R"09"^1 * space
local factor , term , expr = lpeg.V"factor" , lpeg.V"term" , lpeg.V"expr"
parser = lpeg.P {
"expr",
factor = number
+ open * expr * close,
term = factor * (factorop * factor)^0,
expr = term * (termop * term)^0
}
As such, I've been concentrating on using the re module to
brush up on my parsing skills to the
point that I've been ignoring a key compent of
LPeg—expressions!
Sure, raw LPeg isn't pretty, but as you can see from the above example, it is built up out of expressions. And that's a powerful abstraction right there.
For instance, in mod_blog, I have code that will parse text,
converting certain sequences of characters like --- (three
dashes) into an HTML
entity &mcode;. So, I type the following:
``The name of our act is---The Aristocrats! ... Um ... hello?''
which is turned into
“The name of our act is—The Aristocrats! … Um … hello?”
to be rendered on your screen as:
“The name of our act is—The Aristocrats! … Um … hello?”
Now, I only support a few character sequences (six) and that takes 160 lines of C code. Adding support for more is a daunting task, and one that I've been reluctant to take on. But in LPeg, the code looks like:
local lpeg = require "lpeg"
local base =
{
[ [[``]] ] = "“" ,
[ [['']] ] = "”" ,
[ "---" ] = "—" ,
[ "--" ] = "–" ,
[ "..." ] = "…",
[ ".." ] = "‥" ,
}
function mktranslate(tab)
local tab = tab or {}
local chars = lpeg.C(lpeg.P(1))
for target,replacement in pairs(tab) do
chars = lpeg.P(target) / replacement + chars
end
for target,replacement in pairs(base) do
chars = lpeg.P(target) / replacement + chars
end
return lpeg.Ct(chars^0) / function(c) return table.concat(c) end
end
Now, I could do this with the re module:
local re = require "re"
local R = { concat = table.concat }
local G = --[[ lpeg/re ]] [[
text <- chars* -> {} -> concat
chars <- '`' -> '“'
/ "''" -> '”'
/ '---' -> '—'
/ '--' -> '–'
/ '...' -> '&helip;'
/ '..' -> '‥'
/ { . }
]]
filter = re.compile(G,R)
But the former allows me to pass in an additional table of translations to do in addition to the “standard set” programmed in, for example:
translate = mktranslate {
["RAM"] = '<abbr title="Random Access Memory">RAM</abbr>',
["CPU"] = '<abbr title="Central Processing Unit">CPU</abbr>',
["(tm)"] = '™'
}
And I would want this why? Well, I have Lua embedded in mod_blog, so using Lua
to do the translations is straightforward. But, now when I make an entry, I
could include a table of custom translations for that entry. Doing
it this way solves a problem I saw
nearly a decade ago.
![Glasses. Titanium, not steel. [Self-portrait with my new glasses]](https://www.conman.org/people/spc/about/2025/0925.t.jpg)