Looking for more information on how to do PHP the right way? Check out PHP: The Right Way

Sameer Borate:
PHP Simple HTML DOM Parser Script
Jun 21, 2018 @ 09:26:38

Scraping content from other sites (while slightly controversial) can be a helpful way to pull information into your application without the overhead of manual interaction. In this new post to his site Sameer Borate shows how to use a DOM parser to extract data from a remote site.

In this post I have explained some elements to scrap data from external websites. Simple HTML DOM parser is a PHP 5+ class which is useful to manipulate HTML elements. This class can work with both valid HTML and HTML pages that do not pass W3C validation. You can find elements by ids, classes, tags and many more. You can also add, delete or alter DOM elements. The only one thing you should care about is memory leaks – but you can avoid memory leaks as explained later.

He starts by walking through some of the basics of creating a new instance of the class and loading the content (either as a string or as a file) to be parsed. He then give several examples of how to query the contents of the document and locate multiple or single elements (including the use of CSS-type selectors for fuzzy attribute matching). He finishes out the post showing how to access element attributes and append content back to the original HTML.

tagged: simpledom parser script tutorial introduction html dom

Link: https://www.codediesel.com/php/php-simple-html-dom-parser-script/

Sergey Zhuk:
Fast Web Scraping With ReactPHP
Feb 12, 2018 @ 10:55:42

Sergey Zhuk has a new ReactPHP-related post to his site today showing you how to use the library to scrape content from the web quickly, making use of the asynchronous abilities the package provides.

Almost every PHP developer has ever parsed some data from the Web. Often we need some data, which is available only on some website and we want to pull this data and save it somewhere. It looks like we open a browser, walk through the links and copy data that we need. But the same thing can be automated via script. In this tutorial, I will show you the way how you can increase the speed of you parser making requests asynchronously.

In his example he creates a scraper that goes to a movie's page on the IMDB website and extracts the title, description, release date and the list of genres it falls into. Instead of creating a single-threaded process that can only fetch a single page at a time, he uses ReactPHP to speed things up and provide it a list of pages to fetch all at the same time. He starts by walking through the setup of the package and the creation of the browser instance. He then includes the code to make the request and crawl the contents of the result for the data. The post ends with the full code for the client and a way to add in a timeout in case the request fails.

tagged: scraping reactphp tutorial imdb movie crawl dom

Link: http://sergeyzhuk.me/2018/02/12/fast-webscraping-with-reactphp/

Zend Framework Blog:
Scrape Screens with zend-dom
Feb 28, 2017 @ 16:46:27

The Zend Framework blog has posted another tutorial focusing on the use of one of the components that makes up the framework. In this latest tutorial Matthew Weier O'Phinney focuses on the zend-dom component and how to use it for scraping content from remote sources.

Even in this day-and-age of readily available APIs and RSS/Atom feeds, many sites offer none of them. How do you get at the data in those cases? Through the ancient internet art of screen scraping.

The problem then becomes: how do you get at the data you need in a pile of HTML soup? You could use regular expressions or any of the various string functions in PHP. All of these are easily subject to error, though, and often require some convoluted code to get at the data of interest.

[...] zend-dom provides CSS selector capabilities for PHP, via the ZendDomQuery class. [...] While it does not implement the full spectrum of CSS selectors, it does provide enough to generally allow you to get at the information you need within a page.

He gives an example of it in use, showing how to grab a navigation list from the Zend Framework documentation site (a list of items in a <ul> tag). He also suggests some other uses of the tool including use in testing of your application, checking content in the page without having to hard-code specific strings.

tagged: zendframework zenddom scrape content html dom xml tutorial

Link: https://framework.zend.com/blog/2017-02-28-zend-dom.html

James Morris' Blog:
Parsing HTML with DOMDocument and DOMXPath::Query
Jun 27, 2012 @ 10:19:35

In the latest post to his blog James Morris looks at using XPath's query() function to locate pieces of data in your XML.

The other day I needed to do some html scraping to trim out some repeated data stuck inside nested divs and produce a simplified array of said data. My first port of call was SimpleXML which I have used many times. However this time, the son of a bitch just wouldn’t work with me and kept on throwing up parsing errors. I lost my patience with it and decided to give DomDocument and DOMXpath a go which I’d heard of but never used.

He includes a code (and XML document) example showing how to extract out some content from an HTML structure - grabbing each of the images from inside a div and associating them with their description content.

tagged: dom domdocument domxpath xpath tutorial html

Link:

PHPMaster.com:
PHP DOM: Using XPath
Jun 26, 2012 @ 08:16:08

On PHPMaster.com today there's a new tutorial showing you how to use the XPath functionality that's built into PHP's DOM functionality to query your XML.

In a recent article I discussed PHP’s implementation of the DOM and introduced various functions to pull data from and manipulate an XML structure. I also briefly mentioned XPath, but didn’t have much space to discuss it. In this article, we’ll look closer at XPath, how it functions, and how it is implemented in PHP. You’ll find that XPath can greatly reduce the amount of code you have to write to query and filter XML data, and will often yield better performance as well.

They start with some basic XPath queries to find a simple path and locating the record for a specific book. There's also an example of using XPath versus the "find" functions in the DOM functionality (like getElementsByTagName). There's also a bit close to the end about using functions in XPath and how you can pull back in PHP functionality and use native PHP functions in your XPath queries.

tagged: xpath tutorial dom introduction

Link:

PHPMaster.com:
PHP DOM: Working with XML
Jun 08, 2012 @ 08:27:45

On PHPMaster.com there's a new tutorial posted about using XML in PHP, an introduction to using the DOM functionality in PHP to work with your XML content.

PimpleXML allows you to quickly and easily work with XML documents, and in the majority of cases SimpleXML is sufficient. But if you’re working with XML in any serious capacity, you’ll eventually need a feature that isn’t supported by SimpleXML, and that’s where the PHP DOM (Document Object Model) comes in.

He starts with a brief introduction to XML and DTDs including an example of each (defining a sample book information he'll use in the rest of the tutorial). He helps you create a simple class that takes in the XML content, working with construction/destruction of the object and using it to find, add and delete a book by things like ISBN or genre.

tagged: dom tutorial introduction xml

Link:

PHPBuilder.com:
PHP Simple HTML DOM Parser: Editing HTML Elements in PHP
Sep 08, 2011 @ 10:06:07

On PHPBuilder.com today there's a new tutorial from Vojislav Janjic about using a simple DOM parser in PHP to edit the markup even if it's not correctly W3C-formatted - the Simple HTML DOM Parser

Simple HTML DOM parser is a PHP 5+ class which helps you manipulate HTML elements. The class is not limited to valid HTML; it can also work with HTML code that did not pass W3C validation. Document objects can be found using selectors, similar to those in jQuery. You can find elements by ids, classes, tags, and much more. DOM elements can also be added, deleted or altered.

They help you get started using the parser, passing in the HTML content to be handled (either directly via a string or loading a file) and locating elements in the document either by ID, class or tag. Selectors similar to those in CSS are available. Finally, they show how to find an object and update its contents, either by adding more HTML inside or by appending a new object after it.

tagged: simple html dom parse tutorial selector find replace edit

Link:

PHPBuilder.com:
Parsing XML with the DOM Extension for PHP 5
Oct 28, 2010 @ 14:47:56

On PHPBuilder.com there's a new tutorial from Octavia Anghel about using the DOM extension to parse XML in a PHP5 application. The DOM functionality makes it simpler than even the older PHP4 DOM functionality to work with XML messaging and documents.

DOM (Document Object Model) is a W3C standard based on a set of interfaces, which can be used to represent an XML or HTML document as a tree of objects. A DOM tree defines the logical structure of documents and the way a document is accessed and manipulated. Using DOM, developers create and build XML or HTML documents, navigate their structures, and add, modify, or delete elements and content. The DOM can be used with any programming language, but in this article we will use the DOM extension for PHP 5. This extension is part of the PHP core and doesn't need any installation.

They include both a sample XML file to parse and the code you'll need to pull it in and make a basic DOM object out of it. Also included is some code showing how to pull out certain pieces of information, recurse through a set of XML values, add new nodes to the structure, remove a node and more.

tagged: parse xml dom extension tutorial

Link:

Qafoo.com:
Practical PHPUnit: Testing XML generation
Sep 17, 2010 @ 13:51:02

On the Qafoo blog today there's a new post from Tobias Schlitt about a method you can use to unit test methods that generate XML without messing with a lot of extra overhead just to test the results.

Testing classes which generate XML can be a cumbersome work. At least, if you don't know the right tricks to make your life easier. In this article, I will throw some light upon different approaches and show you, how XML generation can be tested quite easily using XPath.

He includes a sample class, qaPersonVisitor, that has methods inside it to create a simple XML documents based on the first and last name data into a DOM element. He sets up the basic test case that creates a simple person - including gender and date of birth - and offer a few different suggestions on handling the check (in PHPUnit tests):

  • the naive way of rebuilding the DOM object and assert that they are equal
  • testing the resulting XML from the DOM object against a pre-generated XML document
  • matching the contents via CSS selectors
  • using the tag matching assertions
  • using XPath in a custom assertion (with short and long uses of it included)
tagged: unittest phpunit xml generation xpath dom

Link:

Thomas Weinert's Blog:
Using PHP DOM With XPath
Apr 13, 2010 @ 13:18:32

Thomas Weinert has a recent post to his blog showing how to use one of the more powerful XML-handling features that PHP's DOM extension includes - XPath.

Often I hear people say "We use SimpleXML, because DOM is so noisy and complex". Well, I don't think so. This article explains how you can parse a XML (an Atom feed) using the PHP DOM extension. No other libraries are involved.

In his example he loads an external feed (his own) into a DOM object, blocks any errors with a few handy functions and creates a DOMXPath object on the DOM object to get ready for his queries. He shows how to make searches for titles, subtitles, looping over attributes and an element list returned from one of the first queries. A full code listing is also provided to show how it all fits together.

tagged: dom xpath domxpath tutorial search atom

Link: