News Feed
Sections




News Archive
Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Sameer Borate's Blog:
Web scraping tutorial
March 09, 2009 @ 07:52:43

In a new tutorial on his blog today, Sameer shows a library that you can use (simplehtmldom) to parse remote sites and pull out just the information you need (aka "web scraping").

There are three ways to access a website data. One is through a browser, the other is using a API (if the site provides one) and the last by parsing the web pages through code. The last one also known as Web Scraping is a technique of extracting information from websites using specially coded programs. In this post we will take a quick look at writing a simple scraper using the simplehtmldom library.

His three (really more) step process guides yo through installing the library, installing Firebug and some example code to create your first scraper - an example that pulls some of the "Featured Links" from the Google search results sidebar. The second example illustrates grabbing the list of the table of contents from the most recent issue of Wired.

5 comments voice your opinion now!
web scraping tutorial simplehtmldom google search results wired tableofcontents


blog comments powered by Disqus

Similar Posts

PPI Framework Blog: Tutorial: GeoLocation with FourSquare and Google Maps

Codewalkers.com: Book Raffle - Pro PHP XML and Web Services

DevShed: Positioning Strings with the show_xy() Method in PDF Documents with PHP 5

Zend: Zend Framework Database Access Webinar (Oct 31st, 2007)

DevShed: File Security and Resources with PHP


Community Events





Don't see your event here?
Let us know!


symfony community deployment list framework voicesoftheelephpant tips code language conference threedevsandamaybe bugfix introduction podcast api series interview laravel release zendserver

All content copyright, 2014 PHPDeveloper.org :: info@phpdeveloper.org - Powered by the Solar PHP Framework