PHP Web Scraping of JavaScript Generated Contents

Scrape web page data generated by javascript

You need to look at PhantomJS.

From their site:

PhantomJS is a headless WebKit with JavaScript API. It has fast and
native support for various web standards: DOM handling, CSS selector,
JSON, Canvas, and SVG.

Using the API you can script the "browser" to interact with that page and scrape the data you need. You can then do whatever you need with it; including passing it to a PHP script if necessary.


That being said, if at all possible try not to "scrape" the data. If there is an ajax call the page is making, maybe there is an API you can use instead? If not, maybe you can convince them to make one. That would of course be much easier and more maintainable than screen scraping.

PHP: How to scrape content of the website based on Javascript

I must use PHP in this case so i need to simulate JS based browser.

I'd recommend you two ways:

  1. Leverage v8js php plugin to deal with site's js when scraping. See here an usage example.
  2. Simulate JS based browser thru using Selenium, iMacros or webRobots.io Chrome ext. But in this case you are off the PHP scripting.

Scraping Javascript generated content in PHP

It's probably impossible to do that on a shared hosting, fortunately my provider installed PhantomJS on request, so it solved the problem.



Related Topics



Leave a reply



Submit