News Feed
Sections




News Archive
Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Stefan Koopmanschap's Blog:
PHP Hidden Gem similar_text()
July 13, 2009 @ 09:37:50

Stefan Koopmanschap has written about a hidden gem he discovered in PHP to help locate blocks of text that seem similar from one or more sources - similar_text.

I am working on a hobby project where I aggregate feeds from several different sources. With the blogs I work it right now, it often happens that an author posts the same post to a few different sites. However, because of site formats and sometimes also quick edits an author makes on one site but not on the author, the article contents are usually not identical strings. So I needed something that would help me figure out whether or not two strings are nearly identical.

After Googling around and finding things like the xdiff extension and soundex, he discovered the two functions he needed - levenshtein and similar_text.

I am still trying to figure out which percentage will catch the duplicates but not catch too many posts which are only similar but not actually duplicates, but with the above 75% I seem to catch quite a few duplicates so far.
0 comments voice your opinion now!
similartext gem hidden


blog comments powered by Disqus

Similar Posts

Johannes Schluter's Blog: A hidden gem in PHP 5.3: fileinfo

Derick Rethans: Contributing Advent 1: Xdebug and hidden properties

Zend Developer Zone: Zend Framework Hidden Gems: Introduction

Zend Developer Zone: Zend Framework Hidden Gems: Zend_Cache

Stefan Koopmanschap's Blog: PHP Hidden Gem: similar_text()


Community Events





Don't see your event here?
Let us know!


interview language series list community configure unittest symfony release framework conference threedevsandamaybe version api opinion podcast voicesoftheelephpant introduction composer laravel

All content copyright, 2015 PHPDeveloper.org :: info@phpdeveloper.org - Powered by the Solar PHP Framework