Phantomjs Not Waiting for "Full" Page Load

Phantom Js does not wait for page to be load or all jquery code to be executed

According to this Stack Overflow post, this is a problem with phantomjs, not Java. However, If you are just trying to get a webpage as PDF, there are other options. After a bit of google-fu, I found this website which supplies a flexible command-line tool with ample documentation, as well as a C library. Seeing as you are already using a command-line tool to get your page, implementing this one should be no problem in your code. Here is a short example:

    String wkhtmltopdf_path = "/usr/local/bin/wkhtmltopdf";
String paperSize = "A4";
String url = "https://www.google.com/";

// This is where your output file is/was defined, now with error handling
File outputFile = null;
try {
//file in which you want to save your pdf
outputFile = File.createTempFile("sample", ".pdf");
System.out.println(outputFile.getAbsolutePath()); // Show output file path, remove this in production
} catch (IOException e1) {
e1.printStackTrace();
}

// This is where your process runs, with error handling
Process process = null;
try {
process = Runtime.getRuntime().exec(String.format("%s -s %s %s %s", wkhtmltopdf_path, paperSize, url, outputFile.getAbsolutePath()));
} catch (IOException e) {
e.printStackTrace(); // Do your error handling here
}

// This is where your exitStatus and waitFor() was/is, with error handling
int exitStatus = 0;
try {
//do a wait here to prevent it running for ever
exitStatus = process.waitFor();
} catch (InterruptedException e) {
e.printStackTrace(); // Do your error handling here
}
if (exitStatus != 0) {
// Do error handling here
}

How can I wait for the page to be ready in PhantomJS?

It seems that the only way to do this was to use callbacks from the DOM to PhantomJS.

var page = require('webpage').create();
var system = require('system');

page.onInitialized = function() {
page.onCallback = function(data) {
console.log('Main page is loaded and ready');
//Do whatever here
};

page.evaluate(function() {
document.addEventListener('DOMContentLoaded', function() {
window.callPhantom();
}, false);
console.log("Added listener to wait for page ready");
});

};

page.open('https://www.google.com', function(status) {});

phantomjs doesn't render a page

It doesn't render a screenshot, because the page has no <body> initially and therefore nothing to render. Everything, including the body, is loaded through JavaScript after PhantomJS' onLoadFinished event fires.

You need to wait a little for a full page load. A simple 5 second wait was sufficient for me:

page.open('http://www.telegraaf.nl/', function(status) {
setTimeout(function(){
page.render("screenshot.png");
phantom.exit();
}, 5000);
});

You can of course wait in a more fancy way in order to make it more robust and not to wait too long: phantomjs not waiting for “full” page load


You may need to run PhantomJS with --ignore-ssl-errors=true (and maybe --ssl-protocol=any if PhantomJS <1.9.8).

PhantomJS 2.0.0 doesn't wait for page to load

PhantomJS doesn't define when in the page load process the page.open callback is called. So, there's nothing actually wrongly claimed.

It could be that you can add a static wait amount with setTimeout() which should help for dynamic sites. There are also approaches where you can see if there are pending requests by counting how many requests where sent with page.onResourceRequested and how many requests finished with page.onResourceReceived/page.onResourceTimeout/page.onResourceError.

If it is actually a PhantomJS bug, then there is not much can to besides try some of the command line switches.

PhantomJS onLoadFinished not working

The official PhantomJS rasterize.js script used for taking screenshots of pages uses a 200ms timeout:

page.open(address, function (status) {
if (status !== 'success') {
console.log('Unable to load the address!');
phantom.exit(1);
} else {
window.setTimeout(function () {
page.render(output);
phantom.exit();
}, 200);
}
});

(copied from source linked above)

It's OK and recommended to do the same instead of rendering directly inside the event.



Related Topics



Leave a reply



Submit