On the Blue Parabola blog Matthew Turland has written up a post about working with the CDC, the Ceres Document Checker - a project he's developed to check documents based on the Ceres document format.
If you've written or done editing for php|architect before, you're probably familiar with the custom markup format they use called Ceres, which looks a bit like Markdown. Both articles and books use it, though each has slightly different formatting requirements. Some of these requirements can be tedious to check for and easy to miss. As much as I've been working with documents in the format, I decided to write a tool to help me out.
He outlines the requirements he wanted to follow including that it could be run from the command line, that it could detect code blocks, it could perform lint checks on the code samples and it could give a rough word count excluding code. He also includes his three methods for processing - finding a file and checking it, recursing through a directory and finding files matching a regular expression.
Files are pulled in and processed line-by-line until a code block is reached. This is processed via a regular expression and the script continues on. If you're interested in the code, you can check out the latest version from the project's github page. There's also a TextMate bundle (written by Davey Shafik) for those users of the TextMate editor.