PHPDeveloper: PHP News, Views and Community

Subscribe

@phpdeveloper.org

News Archive

Community News: Latest PEAR Releases (10.27.2025)

Community News: Latest PEAR Releases (07.28.2025)

Community News: Latest PEAR Releases (07.21.2025)

Community News: Latest PECL Releases (06.24.2025)

Community News: Latest PECL Releases (06.17.2025)

Community News: Latest PECL Releases (06.10.2025)

Community News: Latest PECL Releases (06.03.2025)

Community News: Latest PECL Releases (05.27.2025)

Community News: Latest PEAR Releases (05.26.2025)

Community News: Latest PECL Releases (05.20.2025)

Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Anthony Ferrara:
Tries and Lexers

byChris Cornutt May 18, 2015 @ 14:47:32

Anthony Ferrara has an interesting new post to his site talking about tries and lexers, two pieces of a puzzle that are used during script execution. In this case, he's tried his hand at writing a parser which, naturally, lead to needing a lexer.

Lately I have been playing around with a few experimental projects. The current one started when I tried to make a templating engine. Not just an ordinary one, but one that understood the context of a variable so it could encode/escape it properly. [...] So, while working on the templating engine, I needed to build a parser. Well, actually, I needed to build 4 parsers. [...] I decided to hand write this dual-mode parser. It went a lot easier than I expected. In a few hours, I had the prototype built which could fully parse Twig-style syntax (or a subset of it) including a more-or-less standards-compliant HTML parser. [...] But I ran into a problem. I didn't have a lexer...

He starts with a brief description of what a lexer is and provides a simple example of an expression and how it would be parsed into its tokens. He then talks about the trie, a method for "walking" the input and representing the results in a tree structure. He shows a simple implementation of it in PHP, iterating over a set of tokens and the array results it produces. He then takes this and expands it out a bit into a "lex" function that iterates over the string and compiles the found tokens.

From there he comes back to the subject of Javascript, pointing out that it's a lot looser than PHP in how it even just allows numbers to be defined. His testing showed a major issue though - memory consumption. He found that a regular expression method consumed too much and tried compiling out to classes instead (and found it much faster once the process was going).