How to Submit a Form Using Phantomjs

How to submit a form using PhantomJS

I figured it out. Basically it's an async issue. You can't just submit and expect to render the subsequent page immediately. You have to wait until the onLoad event for the next page is triggered. My code is below:

var page = new WebPage(), testindex = 0, loadInProgress = false;

page.onConsoleMessage = function(msg) {
console.log(msg);
};

page.onLoadStarted = function() {
loadInProgress = true;
console.log("load started");
};

page.onLoadFinished = function() {
loadInProgress = false;
console.log("load finished");
};

var steps = [
function() {
//Load Login Page
page.open("https://website.example/theformpage/");
},
function() {
//Enter Credentials
page.evaluate(function() {

var arr = document.getElementsByClassName("login-form");
var i;

for (i=0; i < arr.length; i++) {
if (arr[i].getAttribute('method') == "POST") {

arr[i].elements["email"].value="mylogin";
arr[i].elements["password"].value="mypassword";
return;
}
}
});
},
function() {
//Login
page.evaluate(function() {
var arr = document.getElementsByClassName("login-form");
var i;

for (i=0; i < arr.length; i++) {
if (arr[i].getAttribute('method') == "POST") {
arr[i].submit();
return;
}
}

});
},
function() {
// Output content of page to stdout after form has been submitted
page.evaluate(function() {
console.log(document.querySelectorAll('html')[0].outerHTML);
});
}
];

interval = setInterval(function() {
if (!loadInProgress && typeof steps[testindex] == "function") {
console.log("step " + (testindex + 1));
steps[testindex]();
testindex++;
}
if (typeof steps[testindex] != "function") {
console.log("test complete!");
phantom.exit();
}
}, 50);

How to submit a form with PhantomJS?

It seems that you want to submit the form. You can achieve that in different ways, like

  • clicking the submit button
  • submit the form in the page context:

    page.evaluate(function() {
    document.forms[0].submit();
    });
  • or focus on the form text field and send an enter keypress with sendEvent().

After that you will have to wait until the next page is loaded. This is best done by registering page.onLoadFinished (which then contains your remaining script) right before submitting the form.

page.open(url, function(){
page.onLoadFinished = function(){
page.render("nextPage.png");
phantom.exit();
};
page.evaluate(function() {
document.forms[0].test_data.value="555";
document.forms[0].submit();
});
});

or you can simply wait:

page.open(url, function(){
page.evaluate(function() {
document.forms[0].test_data.value="555";
document.forms[0].submit();
});
setTimeout(function(){
page.render("nextPage.png");
phantom.exit();
}, 5000); // 5 seconds
});

How to get results after submit form with PhantomJS?

page.onLoadFinished must not be called inside of page.evaluate, but inside the main PhantomJS script:

var page = require('webpage').create();

page.onLoadFinished = function(){

var html = page.evaluate(function(){
return document.getElementById("nombrez").innerHTML;
});
console.log(html);
phantom.exit();

};

page.open('http://localhost/phantom/', function() {
page.includeJs("https://code.jquery.com/jquery-3.1.1.slim.js", function() {
page.evaluate(function() {

$('#nombre').val('Fabian');
document.forms[0].submit();
});
});
});

However page.onLoadFinished fires every time a page is done loading and with this implementation phantom will exit the first the time page is loaded, even before the form is submitted.

You need to implement some check to distinguish between the first and the second load of the page. For example, if return html variable is empty it means that we haven't submitted page yet.

Scrape information with form submit using Phantom

There are several issues with your script that prevent successful scrape.

To check a checkbox, you don't set its value again (it's already set in HTML!), you set its checked attribute to true:

document.getElementById('crID%3a250').setAttribute("checked", true); // France

The button that submits the form is a hyperlink <a> which doesn't have a submit method, it should be clicked (it even has onClick function in the code)

 document.getElementById('ctl00_main_filters_anchorApplyBottom').click(); // submit the form

**The search request ** is sent through ajax and takes time to complete, so your script should wait for at least a second vefore trying to fetch the data. I'll show how to wait in the full working code below.

Next, you may get only the table data, no need to sip through all th HTML:

var result = await page.evaluate(function() {
return document.querySelectorAll('.DataContainer table')[0].outerHTML;
});

Here's a bit trimmed down version of you script with issues corrected:

var phantom = require('phantom');

var url = 'http://data.un.org/Data.aspx?q=population&d=PopDiv&f=variableID%3A12';

// A promise to wait for n of milliseconds
const timeout = ms => new Promise(resolve => setTimeout(resolve, ms));

(async function(req, res) {
const instance = await phantom.create();
const page = await instance.createPage();

await page.on('onResourceRequested', function(requestData) {
console.info('Requesting', requestData.url);
});
await page.on('onConsoleMessage', function(msg) {
console.info(msg);
});

const status = await page.open(url);
await console.log('STATUS:', status);

// submit
await page.evaluate(function() {
document.getElementById('crID%3a250').setAttribute("checked", true); // France
document.getElementById('timeID%3a79').setAttribute("checked", true); // 2015
document.getElementById('varID%3a2').setAttribute("checked", true); // Medium
document.getElementById('ctl00_main_filters_anchorApplyBottom').click(); // click submit button
});

console.log('Waiting 1.5 seconds..');
await timeout(1500);

// Get only the table contents
var result = await page.evaluate(function() {
return document.querySelectorAll('.DataContainer table')[0].outerHTML;
});
await console.log('RESULT:', result);

await instance.exit();
})();

The last but not the least observation is that you could simply try to replay an ajax request made by the form and find out that the URL of search request works quite well on its own, when just opened in another tab:

search result is HTML

You don't even need a headless browser to get it, just cUrl/requests and process. It happens with sites a lot, so it's useful to check network tab in your browser devtools before scraping.

Update

And if there are so many results that they are scattered over several pages, there is one more parameter to be used in request: Page:

data.un.org/Handlers/DataHandler.ashx?Service=page&Page=3&DataFilter=variableID:12&DataMartId=PopDiv&UserQuery=population&c=2,4,6,7&s=_crEngNameOrderBy:asc,_timeEngNameOrderBy:desc,_varEngNameOrderBy:asc&RequestId=461

PhantomJS and clicking a form button

Ok. I think I figured it out. I directed PhantomJS and my script to a website where I could monitor the data on the back-end. To my surprise, the button was being clicked. I just couldn't see the results.

Courtesy of this post from Vinjay Boyapati, the problem appears to have more to do with page handlers and sequencing. It seems that the best way to handle page transitions in PhantomJS is to initiate the page cycle (click on a submit button, a link, etc.) and exit that JS evaluate function. After checking PhantomJS to make sure the page had completely loaded and was stable, call another page.evaluate and look for whatever you expected to find when the browser fetched back the results of your submission. Here's the code that I copied/modified from Vinjay's post:

Edit: One thing to draw particular attention to. For each page.Evaluates() where jQuery is needed, I'm adding the page.injectJs("jquery1-11-1min.js"); line. Otherwise I would get "$ is undefined" as a page error.

var page = require('webpage').create();var loadInProgress = false;var testindex = 0;
// Route "console.log()" calls from within the Page context to the main Phantom context (i.e. current "this")page.onConsoleMessage = function(msg) { console.log(msg);};
page.onAlert = function(msg) { console.log('alert!!> ' + msg);};
page.onLoadStarted = function() { loadInProgress = true; console.log("load started");};
page.onLoadFinished = function(status) { loadInProgress = false; if (status !== 'success') { console.log('Unable to access network'); phantom.exit(); } else { console.log("load finished"); }};
var steps = [ function() { page.open('http://www.MadeUpURL.com'); },
function() { page.injectJs("jquery1-11-1min.js"); page.evaluate(function() { document.getElementById('Address').value = '302 E Buchtel Avenue'; //University of Akron if you're wondering document.getElementById('City').value = 'Akron'; document.getElementById('State').selectedIndex = 36; document.getElementById('ZipCode').value = '44325'; console.log('JQ: ' + $().jquery); $('#btnSearch').click(); console.log('Clicked'); }); }, function() { console.log('Answers:'); page.injectJs("jquery1-11-1min.js"); page.render('AnswerPage.png'); page.evaluate(function() { console.log('The Answer: ' + document.getElementById('TheAnswer').innerHTML); $('#buttonOnAnswerPage').click(); // This isn't really necessary unless you need to navigate deeper console.log('Sub button clicked'); }); }, function() { console.log('More Answers:'); // This function is for navigating deeper than the first-level form submission page.render('MoreAnswersPage.png'); page.evaluate(function() { console.log('More Stuff: ' + document.body.innerHTML); }); }, function() { console.log('Exiting'); }];
interval = setInterval(function() { if (!loadInProgress && typeof steps[testindex] == "function") { console.log("step " + (testindex + 1)); steps[testindex](); testindex++; } if (typeof steps[testindex] != "function") { console.log("test complete!"); phantom.exit(); }}, 50);

Phantomjs and HTML 5 fire click event and submit form fails

I've made some modifications to your script that allowed it to successfully send the page (however the quote could not be generated at the time, probably due to all those test values?).

I'm sure things like error control and viewport size are present in your script, but for the sake of educating other readers I'll keep them in the answer.

Here's the working script with notes:

var page = require('webpage').create();
var screenshotNum = 1;

page.viewportSize = { width: 1366 , height: 768 };
page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36';

page.onConsoleMessage = function(msg, lineNum, sourceId) {
console.log('CONSOLE: ' + msg + ' (from line #' + lineNum + ' in "' + sourceId + '")');
};

// We always need to be aware of any and all errors
// happening on the target page. Most of the time here lies the answer
page.onError = function(msg, trace) {

var msgStack = ['ERROR: ' + msg];

if (trace && trace.length) {
msgStack.push('TRACE:');
trace.forEach(function(t) {
msgStack.push(' -> ' + t.file + ': ' + t.line + (t.function ? ' (in function "' + t.function +'")' : ''));
});
}

console.error(msgStack.join('\n'));
};

// This fires every time a new page is loaded
// If a form has been sent this will show the URL of success/fail page
page.onLoadFinished = function(){

var url = page.evaluate(function(){
return document.location.href;
});

console.log(url);

// And every time a page is loaded we'll make a screenshot
// to make sure everything goes according to our expectations
page.render("travel-" + screenshotNum++ + ".jpg");

// It we've moved to another page, it means the form was sent
// So let's exit
if(url != 'https://travel.tescobank.com/') {
phantom.exit();
}

};

page.open("https://travel.tescobank.com/", function(status) {

// Let's wait a bit before filling fields in, so that
// any javascript on the page will have time to kick in
setTimeout(fillFields, 2000);

});

function fillFields()
{
// click the button for single trip
var tripType = page.evaluate(function() {
var trip = $("#single-trip-radio").trigger("click");
return (trip != null);
});

// enter value in dropdown
var countrySearch = page.evaluate(function() {
var country = $("#countrySearch").val("France");
return (country != null);
});

var datePickerOne = page.evaluate(function() {
var datePicker1 = $("#stFromdate").val("11-11-2017");
return $("#stFromdate");
});

var datePickerTwo = page.evaluate(function() {
var datePicker2 = $("#stTodate").val("18-11-2017");
return $("#stTodate").val();
});

var numberOfTravellers = page.evaluate(function() {
var number = $("#couple").trigger("click");
return $("#couple").val();
});

var firstName = page.evaluate(function() {
var fname = $("#fName").val("Robert");
return $("#fName").val();
});

var secondName = page.evaluate(function() {
var sname = $("#sName").val("Johnson");
return $("#sName").val();
});

var dateOfBirth = page.evaluate(function() {
var dob = $("#phDOB").val("11-10-1977");
return $("#phDOB").val();
});

var dateOfBirth2 = page.evaluate(function() {
var dob2 = $("#ydob1").val("11-10-1977");
return $("#ydob1").val();
});

var postcode = page.evaluate(function() {
var pc = $("#postcode").val("SS1 2AA");
return $("#postcode").val();
});

var email = page.evaluate(function() {
var em = $("#emailaddr").val("test@test.com");
return $("#emailaddr").val();
});

// this click on the button does fire the form validation
// It will submit the form if no errors are found
var submitForm = page.evaluate(function() {
var theForm = $("#aboutYouSubmit").trigger("click");
return (theForm != null);
});

// If the page haven't been sent due to errors
// this will count how many of them are there
// and will make a screenshot of the page
setTimeout(function(){
var errorsTotal = page.evaluate(function(){
return $(".error:visible").length;
});
console.log("Total errors: " + errorsTotal);
page.render("travel-filled.jpg");
}, 1500);
}

Why wouldn't the page submit before? On your screenshot there is an error telling the first name is not filled. Could be a typo in filling that or maybe jQuery wasn't loaded at the time of filling fields in.

Phantomjs - How to populate a form, submit and get the results?

It isn't sufficient to render a page immediately after "clicking". You have to give the web engine time to make whatever calls are required and execute the resulting JavaScript.

Consider the following after your call to evaluate:

window.setTimeout(
function () {
page.render( 'google.png' );
phantom.exit(0);
},
5000 // wait 5,000ms (5s)
);

By the way - the click may or may not work depending on what kind of element it is. If that doesn't work I suggest you search the Internet for how to click on a DIV or whatever type of element it happens to be (there is a technique which involves creating a mouse event).

Use Submit Button with PhantomJS

This is because the form doesn't have a class of "loginf", that's the value of the name attribute. You could try using an attribute selector:

document.querySelector("[name=loginf]").submit();


Related Topics



Leave a reply



Submit