<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>PHPDeveloper.org</title>
    <link>http://www.phpdeveloper.org</link>
    <description>Up-to-the Minute PHP News, views and community</description>
    <language>en-us</language>
    <pubDate>Sat, 25 May 2013 02:17:04 -0500</pubDate>
    <ttl>30</ttl>
    <item>
      <title><![CDATA[Christian Schaefer's Blog: Using PHP Web Scraper Goutte in a Console Task in a Silex project]]></title>
      <guid>http://www.phpdeveloper.org/news/16969</guid>
      <link>http://www.phpdeveloper.org/news/16969</link>
      <description><![CDATA[<p>
In a recent post to his blog <i>Christian Schaefer</I> shows how to use the <a href="https://github.com/fabpot/Goutte">Goutte</a> tool (a web scraper) to pull information from one site and use it in another <a href="http://silex.sensiolabs.org/">Silex</a>-powered one. <a href="http://test.ical.ly/2011/09/30/using-php-web-scraper-goutte-in-a-console-task-in-a-silex-project/">His tutorial</a> uses a custom service provider for the integration.
</p>
<blockquote>
Since I discovered the <a href="http://test.ical.ly/2011/09/29/deploy-your-silex-and-twig-powered-facebook-app-using-git-onto-free-heroku-cloud-hosting/">free Facebook App hosting by heroku</a> I keep wanting to make something useful out of it. So I thought about a small service app. Without going into details yet about its nature there was one immediate problem to be solved. How to get hold of the data? So I thought to scrape it off some website. I know this isn't very nice but unfortunately there is no feed I can use.. And how to best scrape a website? Use Goutte!
</blockquote>
<p>
All you'll need is two things - the <a href="https://raw.github.com/fabpot/Goutte/master/goutte.phar">goutte.phar</a> and <a href="http://silex.sensiolabs.org/get/silex.phar">Silex</a> phar files. The code for the service provider is a simple registration of namespaces. With that integrated, it's as simple as making a client object and calling it with a URL.
</p>]]></description>
      <pubDate>Mon, 10 Oct 2011 08:26:24 -0500</pubDate>
    </item>
  </channel>
</rss>
