On the Genius Engineering blog today they share a library they've created to help filter out possibly malicious content coming from the user - HTML content, valid or not.
Some time ago, Genius Engineering decided to unify the manner in which we encode values that contain user input. We previously depended upon the PHP built-in htmlentities() and some simple wrappers around it for our encoding needs, but this function alone can’t safely sanitize tainted data in all contexts. [...] While there is plenty of information about these issues and what must be done to fix them, there is a distinct dearth of libraries in PHP to properly encode strings for all of the situations.
They include a few code examples of how to use their sanitizing library [tar.gz] to filter HTML overall, HTML attributes and filter strings for use in Javascript.