Juozas Kaziukenas has an example of how to keep you and your application's data safe from prying eyes by filtering input with the HTML_Purifier package.
It’s really hard to decide what data is acceptable, especially when user has permission to insert HTML content through form. [...] However, problem can be solved, and quite easily. Almost a year ago I was reading some random blog when I find out about HTML Purifier. Basically, it’s library which can filter and fix any HTML.
He gives an example - running a web scraping tool against a site with malformed HTML. By running it through the HTML_Purifier package first, the errors were corrected and the "more correct" HTML source could be parsed easily. The package also helps to protect from XSS attacks via a whole set of filters included by default.