PHPDeveloper: PHP News, Views and Community

Subscribe

@phpdeveloper.org

News Archive

Community News: Latest PEAR Releases (07.28.2025)

Community News: Latest PEAR Releases (07.21.2025)

Community News: Latest PECL Releases (06.24.2025)

Community News: Latest PECL Releases (06.17.2025)

Community News: Latest PECL Releases (06.10.2025)

Community News: Latest PECL Releases (06.03.2025)

Community News: Latest PECL Releases (05.27.2025)

Community News: Latest PEAR Releases (05.26.2025)

Community News: Latest PECL Releases (05.20.2025)

Community News: Latest PECL Releases (05.13.2025)

Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

SitePoint PHP Blog:
OCR in PHP: Read Text from Images with Tesseract

byChris Cornutt Oct 23, 2015 @ 17:14:27

The SitePoint PHP blog has a tutorial posted from author Lukas White showing you how to implement OCR in your PHP application and read text directly from images with the help of Tesseract.

Optical Character Recognition (OCR) is the process of converting printed text into a digital representation. It has all sorts of practical applications — from digitizing printed books, creating electronic records of receipts, to number-plate recognition and even circumventing image-based CAPTCHAs. [...] Tesseract is an open source program for performing OCR. You can run it on *Nix systems, Mac OSX and Windows, but using a library we can utilize it in PHP applications. This tutorial is designed to show you how.

They walk you through the installation of the Tesseract software locally (well, inside of a VM) and testing the install with the output from a sample image. With that up and working they show how to use this library to work with the Tesseract functionality, passing it in via a simple Silex application endpoint as a POSTed image file. Full code for the sample application is included as well as the results from another sample image. They also include some additional functionality you could use to detect phone numbers in the image content.