Phantom Js does not wait for page to be load or all jquery code to be executed
According to this Stack Overflow post, this is a problem with phantomjs, not Java. However, If you are just trying to get a webpage as PDF, there are other options. After a bit of google-fu, I found this website which supplies a flexible command-line tool with ample documentation, as well as a C library. Seeing as you are already using a command-line tool to get your page, implementing this one should be no problem in your code. Here is a short example:
String wkhtmltopdf_path = "/usr/local/bin/wkhtmltopdf";
String paperSize = "A4";
String url = "https://www.google.com/";
// This is where your output file is/was defined, now with error handling
File outputFile = null;
try {
//file in which you want to save your pdf
outputFile = File.createTempFile("sample", ".pdf");
System.out.println(outputFile.getAbsolutePath()); // Show output file path, remove this in production
} catch (IOException e1) {
e1.printStackTrace();
}
// This is where your process runs, with error handling
Process process = null;
try {
process = Runtime.getRuntime().exec(String.format("%s -s %s %s %s", wkhtmltopdf_path, paperSize, url, outputFile.getAbsolutePath()));
} catch (IOException e) {
e.printStackTrace(); // Do your error handling here
}
// This is where your exitStatus and waitFor() was/is, with error handling
int exitStatus = 0;
try {
//do a wait here to prevent it running for ever
exitStatus = process.waitFor();
} catch (InterruptedException e) {
e.printStackTrace(); // Do your error handling here
}
if (exitStatus != 0) {
// Do error handling here
}
How can I wait for the page to be ready in PhantomJS?
It seems that the only way to do this was to use callbacks from the DOM to PhantomJS.
var page = require('webpage').create();
var system = require('system');
page.onInitialized = function() {
page.onCallback = function(data) {
console.log('Main page is loaded and ready');
//Do whatever here
};
page.evaluate(function() {
document.addEventListener('DOMContentLoaded', function() {
window.callPhantom();
}, false);
console.log("Added listener to wait for page ready");
});
};
page.open('https://www.google.com', function(status) {});
phantomjs doesn't render a page
It doesn't render a screenshot, because the page has no <body>
initially and therefore nothing to render. Everything, including the body, is loaded through JavaScript after PhantomJS' onLoadFinished event fires.
You need to wait a little for a full page load. A simple 5 second wait was sufficient for me:
page.open('http://www.telegraaf.nl/', function(status) {
setTimeout(function(){
page.render("screenshot.png");
phantom.exit();
}, 5000);
});
You can of course wait in a more fancy way in order to make it more robust and not to wait too long: phantomjs not waiting for “full” page load
You may need to run PhantomJS with --ignore-ssl-errors=true
(and maybe --ssl-protocol=any
if PhantomJS <1.9.8).
PhantomJS 2.0.0 doesn't wait for page to load
PhantomJS doesn't define when in the page load process the page.open
callback is called. So, there's nothing actually wrongly claimed.
It could be that you can add a static wait amount with setTimeout()
which should help for dynamic sites. There are also approaches where you can see if there are pending requests by counting how many requests where sent with page.onResourceRequested
and how many requests finished with page.onResourceReceived
/page.onResourceTimeout
/page.onResourceError
.
If it is actually a PhantomJS bug, then there is not much can to besides try some of the command line switches.
PhantomJS onLoadFinished not working
The official PhantomJS rasterize.js
script used for taking screenshots of pages uses a 200ms timeout:
page.open(address, function (status) {
if (status !== 'success') {
console.log('Unable to load the address!');
phantom.exit(1);
} else {
window.setTimeout(function () {
page.render(output);
phantom.exit();
}, 200);
}
});
(copied from source linked above)
It's OK and recommended to do the same instead of rendering directly inside the event.
Related Topics
What Values Can a Constructor Return to Avoid Returning This
Convert String with Commas to Array
How to Download a File with Angular2 or Greater
Truncate Number to Two Decimal Places Without Rounding
What Is the Meaning of "$" Sign in JavaScript
Empty Arrays Seem to Equal True and False at the Same Time
How to Wait for Set of Asynchronous Callback Functions
Is Right Click a JavaScript Event
How to Share $Scope Data Between States in Angularjs Ui-Router
How to Get the Text Node of an Element
Issue in Returning Data Retrieved from Db Queries Called in the Loop
Reactjs - Does Render Get Called Any Time "Setstate" Is Called
Parseint VS Unary Plus, When to Use Which
How to Parse JSON Using Node.Js
Jquery Xml Error ' No 'Access-Control-Allow-Origin' Header Is Present on the Requested Resource.'