News Feed
Sections




News Archive
Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Stefan Koopmanschap's Blog:
PHP Hidden Gem similar_text()
July 13, 2009 @ 09:37:50

Stefan Koopmanschap has written about a hidden gem he discovered in PHP to help locate blocks of text that seem similar from one or more sources - similar_text.

I am working on a hobby project where I aggregate feeds from several different sources. With the blogs I work it right now, it often happens that an author posts the same post to a few different sites. However, because of site formats and sometimes also quick edits an author makes on one site but not on the author, the article contents are usually not identical strings. So I needed something that would help me figure out whether or not two strings are nearly identical.

After Googling around and finding things like the xdiff extension and soundex, he discovered the two functions he needed - levenshtein and similar_text.

I am still trying to figure out which percentage will catch the duplicates but not catch too many posts which are only similar but not actually duplicates, but with the above 75% I seem to catch quite a few duplicates so far.
0 comments voice your opinion now!
similartext gem hidden


blog comments powered by Disqus

Similar Posts

Zend Developer Zone: Zend Framework Hidden Gems: Zend_Db

NetTuts.com: Easy Package Management for CodeIgniter with Sparks

Zend Developer Zone: Zend Framework Hidden Gems: Zend_Cache

Zend Developer Zone: Zend Framework Hidden Gems: Introduction

Zend Developer Zone: Zend Framework Hidden Gems: Introduction


Community Events





Don't see your event here?
Let us know!


series api zendserver install release language framework tips update library introduction list community podcast opinion package deployment interview laravel symfony

All content copyright, 2014 PHPDeveloper.org :: info@phpdeveloper.org - Powered by the Solar PHP Framework