Monday, January 14, 2013
The significance of this is that you can build parsing expressions on the fly …
I found Meta II to be an interesting approach to parsing, and the closest modern equivilent to that are parsing expression grammars (PEGs), and the easiest one to use I've found is the Lua implementation LPeg.
What's interesting about LPeg is that it isn't compiled into Lua, but into a specialized parsing VM, which makes it quite fast. Maybe not as fast as lex and yacc but certain easier to understand and vastly easier to use.
Let me amend that: I find the re
module
to be easier to use (which is build on LPeg), as I find this:
local re = require "re" parser = re.compile [[ expr <- term (termop term)* term <- factor (factorop factor)* factor <- number / open expr close number <- space '-'? [0-9]+ space termop <- space [+-] space factorop <- space [*/] space open <- space '(' space close <- space ')' space space <- ' '? ]]
to be way easier to read and understand than
local lpeg = require "lpeg" local space = lpeg.P" "^0 local close = space * lpeg.P")" * space local open = space * lpeg.P"(" * space local factorop = space * lpeg.S"*/" * space local termop = space * lpeg.S"+-" * space local number = space * lpeg.P"-"^-1 * lpeg.R"09"^1 * space local factor , term , expr = lpeg.V"factor" , lpeg.V"term" , lpeg.V"expr" parser = lpeg.P { "expr", factor = number + open * expr * close, term = factor * (factorop * factor)^0, expr = term * (termop * term)^0 }
As such, I've been concentrating on using the re
module to
brush up on my parsing skills to the
point that I've been ignoring a key compent of
LPeg—expressions!
Sure, raw LPeg isn't pretty, but as you can see from the above example, it is built up out of expressions. And that's a powerful abstraction right there.
For instance, in mod_blog
, I have code that will parse text,
converting certain sequences of characters like ---
(three
dashes) into an HTML
entity &mcode;
. So, I type the following:
``The name of our act is---The Aristocrats! ... Um ... hello?''
which is turned into
“The name of our act is—The Aristocrats! … Um … hello?”
to be rendered on your screen as:
“The name of our act is—The Aristocrats! … Um … hello?”
Now, I only support a few character sequences (six) and that takes 160 lines of C code. Adding support for more is a daunting task, and one that I've been reluctant to take on. But in LPeg, the code looks like:
local lpeg = require "lpeg" local base = { [ [[``]] ] = "“" , [ [['']] ] = "”" , [ "---" ] = "—" , [ "--" ] = "–" , [ "..." ] = "…", [ ".." ] = "‥" , } function mktranslate(tab) local tab = tab or {} local chars = lpeg.C(lpeg.P(1)) for target,replacement in pairs(tab) do chars = lpeg.P(target) / replacement + chars end for target,replacement in pairs(base) do chars = lpeg.P(target) / replacement + chars end return lpeg.Ct(chars^0) / function(c) return table.concat(c) end end
Now, I could do this with the re
module:
local re = require "re" local R = { concat = table.concat } local G = --[[ lpeg/re ]] [[ text <- chars* -> {} -> concat chars <- '`' -> '“' / "''" -> '”' / '---' -> '—' / '--' -> '–' / '...' -> '&helip;' / '..' -> '‥' / { . } ]] filter = re.compile(G,R)
But the former allows me to pass in an additional table of translations to do in addition to the “standard set” programmed in, for example:
translate = mktranslate { ["RAM"] = '<abbr title="Random Access Memory">RAM</abbr>', ["CPU"] = '<abbr title="Central Processing Unit">CPU</abbr>', ["(tm)"] = '™' }
And I would want this why? Well, I have Lua embedded in mod_blog
, so using Lua
to do the translations is straightforward. But, now when I make an entry, I
could include a table of custom translations for that entry. Doing
it this way solves a problem I saw
nearly a decade ago.