News Feed
Sections




News Archive
feed this:

Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

SitePoint PHP Blog:
Image Scraping with Symfony's DomCrawler
March 31, 2014 @ 09:06:43

On the SitePoint PHP blog today there's a new post showing you how to use the Symfony DomCrawler component to scrape content, images mostly, from a remote website. The DomCrawler is one component of the Symfony framework.

A photographer friend of mine implored me to find and download images of picture frames from the internet. I eventually landed on a web page that had a number of them available for free but there was a problem: a link to download all the images together wasn't present. I didn't want to go through the stress of downloading the images individually, so I wrote this PHP class to find, download and zip all images found on the website.

He talks briefly about how the class works and then gets into the contents of the class. He walks through all the code and explains in chunks what each part does in the lifecycle of the request. The end result is a Zip archive file of all images from the remote website, packaged up for easy transport.

0 comments voice your opinion now!
domcrawler symfony framework component tutorial image scrape

Link: http://www.sitepoint.com/image-scraping-symfonys-domcrawler/

Matthew Turland's Blog:
Gotcha on Scraping .NET Applications with PHP and cURL
July 01, 2010 @ 08:51:36

New on his blog today Matthew Turland has posted about a "gotcha" he came across when working with cURL to pull down information (scrape content) from a remote .NET application.

I recently wrote a PHP script to scrape data from a .NET application. In the process of developing this script, I noticed something interesting that I thought I'd share. In this case, I was using the cURL extension, but the tip isn't necessarily specific to that. One thing my script did was submit a POST request to simulate a form submission. [...] The issue I ran into had to do with a behavior of the CURLOPT_POSTFIELDS setting that's easy to overlook.

The problem was something cURL does automatically - change the header for the content type because you're sending an array. Thankfully, with the help of a call to http_build_query to encode it correctly, the request will use the right headers.

0 comments voice your opinion now!
net application scrape content gotcha curl


Juozas Kaziukenas' Blog:
Scraping login requiring websites with cURL
February 24, 2009 @ 08:44:43

Several sites have areas that have content protected behind a login making them difficult to pull into a script. Juozas Kaziukenas has created an option to help you past this hurdle - a PHP class (that uses cURL) that can POST the login data to the script and pull back the session ID.

But how you are going to do all this work with cookies and session id? Luckily, PHP has cURL extension which simplifies connecting to remote addresses, using cookies, staying in one session, POSTing data, etc. It's really powerful library, which basically allows you to use all HTTP headers functionality. For secure pages crawling, I've created very simple Secure_Crawler class.

The class uses the built-in cURL functionality to send the POST information (in this case the username and password, but it can be easily changed for whatever the form requires) and provides a get() method to use for fetching other pages once you're connected.

0 comments voice your opinion now!
login require scrape curl secure crawler tutorial username password


Hasin Hayder's Blog:
Making a jobsite using PHP
January 24, 2008 @ 14:41:38

Hasin Hayder has started up a new project that he's documented in a new blog entry - the creation of a new jobs website in PHP.

I was involved in making a job site few days ago. During the development, I have studied how easily anyone can develop a job site using PHP (language independent in true sense) . So I decide to write a blog post about my experience and here it goes. But note that this article is not about scaling or balancing the load on your site during heavy traffic, heh heh.

He comments on the startup process surrounding this type of site and makes suggestions about something to consider for your careers site - pulling job content from other sites in two ways - screen scraping and using the job search APIs out there.

0 comments voice your opinion now!
job website scrape content popular api


Jonathan Street's Blog:
When scraping content from the web don't make it obvious
November 07, 2007 @ 11:26:00

Jonathan Street has a tip for those developers out there that have no other choice than scraping content from a remote site - don't make it obvious. He also includes a suggestion on how to make it a little less obvious.

A couple of hours ago I was playing around scraping some content from a website. All was going well until suddenly I couldn't get my script to fetch meaningful content. [...] The first thing I did was stop visiting the site for 15 minutes or so and then increase the time between requests. It briefly worked again but quickly stopped.

One simple change to his user agent string in his php.ini made the problem evaporate pointing to a user agent filtering happening on the remote side. His helpful hint involves two methods - one in just PHP and the other in cURL - to change the user agent that your scripts are sending. An even better sort of solution might be some sort of rotating array that would alternate between four or five strings to make things even more random.

0 comments voice your opinion now!
scrape content remote server useragent filter modify phpini scrape content remote server useragent filter modify phpini


MakeBeta Blog:
Scraping Links With PHP
August 15, 2007 @ 12:08:00

From Justin Laing over at Merchant OS there's a new tutorial on creating a simple link scraper with the help of PHP and the cURL extension.

In this tutorial you will learn how to build a PHP script that scrapes links from any web page. You learn how to use cURL, call PHP DOM functions, use XPath and store the links in MySQL.

You'll have to have PHP5 and the cURL extension enabled on your web server to make it all work, but the code is all there ready for you to cut and paste. The application grabs the page with cURL (including the possibility to fake your user agent), parses through the HTML with the DOM and XPath functionality to grab the links and uses the MySQL methods to store them into your database.

2 comments voice your opinion now!
scrape link curl dom xpath mysql tutorial scrape link curl dom xpath mysql tutorial


WaxJelly Blog:
The easiest way to scrape details from a MySpace profile page with PHP
March 20, 2007 @ 10:41:00

From the WaxJelly blog today comes a handy bit of code for anyone out there looking to scrape details from just about any MySpace page out there (quick and easy).

It's amazing how just a little optimization on the part of myspace makes crawling their site so much easier. We're going to scrape the user detail (name, age, sex, etc..) from a profile, using the header info...

The script grabs the contents of the given URL, loops through, pulls out the meta tag information and uses that as a key to grab the rest of the user's information (including name, age, city, state, etc).

0 comments voice your opinion now!
scrape myspace details meta city state country name scrape myspace details meta city state country name



Community Events





Don't see your event here?
Let us know!


library framework language voicesoftheelephpant security podcast laravel composer opinion community series package symfony introduction mvc update tool interview version release

All content copyright, 2014 PHPDeveloper.org :: info@phpdeveloper.org - Powered by the Solar PHP Framework