News Feed
Sections




News Archive
Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Stefan Koopmanschap's Blog:
PHP Hidden Gem similar_text()
July 13, 2009 @ 09:37:50

Stefan Koopmanschap has written about a hidden gem he discovered in PHP to help locate blocks of text that seem similar from one or more sources - similar_text.

I am working on a hobby project where I aggregate feeds from several different sources. With the blogs I work it right now, it often happens that an author posts the same post to a few different sites. However, because of site formats and sometimes also quick edits an author makes on one site but not on the author, the article contents are usually not identical strings. So I needed something that would help me figure out whether or not two strings are nearly identical.

After Googling around and finding things like the xdiff extension and soundex, he discovered the two functions he needed - levenshtein and similar_text.

I am still trying to figure out which percentage will catch the duplicates but not catch too many posts which are only similar but not actually duplicates, but with the above 75% I seem to catch quite a few duplicates so far.
0 comments voice your opinion now!
similartext gem hidden


blog comments powered by Disqus

Similar Posts

Zend Developer Zone: Zend Framework Hidden Gems: Zend_Cache

Zend Developer Zone: Zend Framework Hidden Gems: Zend_Cache

Derick Rethans: Contributing Advent 1: Xdebug and hidden properties

Zend Developer Zone: Zend Framework Hidden Gems: Introduction

NetTuts.com: Easy Package Management for CodeIgniter with Sparks


Community Events





Don't see your event here?
Let us know!


introduction framework release deployment list threedevsandamaybe library developer zendserver series podcast opinion tips language interview community symfony api laravel bugfix

All content copyright, 2014 PHPDeveloper.org :: info@phpdeveloper.org - Powered by the Solar PHP Framework