Greg Beaver has blogged today with more about the port he's been wokring on of the Lemon parser generator to PHP5, this time discussion the creation of two packages - PHP_ParserGenerator and PHP_LexerGenerator.
Last week, I blogged about completing a port of the Lemon parser generator to PHP 5, which I thought was pretty cool. However, in an email, Alex Merz pointed out that without a lexer generator to accompany lemon, it's pretty difficult to write a decent parser.
After Alex's email, I started thinking about what it would take to write a lexer generator. Basically, a lexer generator requires parsing and compiling regular expressions, then scanning the source one character at a time to find matches. So, it occurred to me that perhaps simply combining regular expressions with sub-patterns could accomplish this task quite easily.
He goes on to explain this process, showing how a simple regular expresion call (and a look at its return arguments) could create a simple, easy solution. Since the re2c format is still unsupported in PHP (without a goto to go to), he opts to stick with the regular expressions and creates a "lex2php" format instead.
He's packaged up both halves of this setup and has already posted proposals for them to the PEAR site: