Nikita Popov has a new (language agnostic) post to his blog today about one of the most powerful things you can use in your development - something that a lot of developers don't understand the true power of - regular expressions.
As someone who frequents the PHP tag on StackOverflow I pretty often see questions about how to parse some particular aspect of HTML using regular expressions. A common reply to such a question is: "You cannot parse HTML with regular expressions, because HTML isn’t regular. Use an XML parser instead." This statement - in the context of the question - is somewhere between very misleading and outright wrong. What I’ll try to demonstrate in this article is how powerful modern regular expressions really are.
He starts with the basics, defining the "regular" part of "regular expression" (hint: it has to do with predictability) and the grammar of the expressions. He talks about the Chomsky hierarchy and how it relates to the "regular" as well as a more complex mapping of expression to language rules. He talks about matching context-free and context-sensitive languages and unrestricted grammars as well.