Using Rvest or Httr to Log in to Non-Standard Forms on a Webpage

Using rvest or httr to log in to non-standard forms on a webpage

Your rvest code isn't storing the modified form, so in you're example you're just submitting the original pgform without the values being filled out. Try:

library(rvest)

url <-"http://www.perfectgame.org/" ## page to spider
pgsession <-html_session(url) ## create session
pgform <-html_form(pgsession)[[1]] ## pull form from session

# Note the new variable assignment

filled_form <- set_values(pgform,
`ctl00$Header2$HeaderTop1$tbUsername` = "myemail@gmail.com",
`ctl00$Header2$HeaderTop1$tbPassword` = "mypassword")

submit_form(pgsession,filled_form)

And I now see a nice 200 status code response instead of an error. Note that because the desired submit button appears to be the first submit button, we don't need to give it as an argument, but otherwise we'd just be giving it a a string (straight quotes, not back quotes).

Web-Scraping with Login and Redirect using R and rvest/httr

library(rvest)
url<-"https://kickbase.sky.de/"
page<-html_session(url)
page<-rvest:::request_POST(page,url="https://kickbase.sky.de/api/v1/user/login",
body=list("email"="testscrape@gmail.com",
"password"="tester",
"redirect_url"="http://kickbase.sky.de/spielerprofil/nadiem-amiri/1639#"),
encode='json'
)
player_page<-jump_to(page,"https://kickbase.sky.de/api/v1/news?skip=0&player=1639&limit=3")
data<-jsonlite::fromJSON(readBin(player_page$response$content,what="json"))

print(data)

Please note that the website provides an API and that is where you get the data
https://kickbase.sky.de/api/v1/news?skip=0&player=1639&limit=3

variable data has all the information needed

How to proceed when redirected to page after successful sign in with POST method

SOLVED!

It was a quite easy and intuitive solution, I just needed to submit the form method="post" input type="hidden" of the redirecting page, i.e. the one encountered in the signed.in session.
I solved it with rvest but I think that httr would be equally easy, here comes the code I used:

   library(rvest)

signin.session <- html_session(signin)
signin.form <- html_form(signin.session)[[1]]
filled.signin <- set_values(signin.form,
`user[email]` = user.email,
`user[password]` = user.password)

signed.in <- submit_form(signin.session, filled.signin)
redirect.form <- html_form(signed.in)[[1]]
redirected <- submit_form(signed.in, redirect.form)

This last object redirected is a session-class object, basically the page which can be normally browsed after signing in the website.

In case someone has a shorter, more effective, more elegant/sexy/charming solution to proceed...please don't hesitate to share it.

I'm an absolute beginner of web-scraping, and I am keen to learn more about these operations!

THX

403 Error When Using Rvest to Log Into Website For Scraping

Using R.S.'s suggestion, I used RSelenium to log in successfully.

A quick note for fellow mac users on using either chrome or phantom. I am running El Capitan so had some issue getting the mac to recognize the paths to both of the bin files. Instead, I moved the bin files to /usr/local/bin and they ran without an issue.

Below is the code to do so:

library(RSelenium)
RSelenium::startServer()
remDr <- remoteDriver(browserName = "chrome")
remDr$open()
appURL <- 'https://www.optionslam.com/accounts/login/'
remDr$navigate(appURL)
remDr$findElement("id", "id_username")$sendKeysToElement(list("user"))
remDr$findElement("id", "id_password")$sendKeysToElement(list("password", key='enter'))

appURL <- 'https://www.optionslam.com/earnings/stocks/MSFT?page=-1'
remDr$navigate(appURL)

This can also be done with phantom,

library(RSelenium)

pJS <- phantom() # start phantomjs

appURL <- 'https://www.optionslam.com/accounts/login/'
remDr <- remoteDriver(browserName = "phantomjs")
remDr$open()
remDr$navigate(appURL)
remDr$findElement("id", "id_username")$sendKeysToElement(list("user"))
remDr$findElement("id", "id_password")$sendKeysToElement(list("password", key='enter'))

appURL <- 'https://www.optionslam.com/earnings/stocks/MSFT?page=-1'
remDr$navigate(appURL)


Related Topics



Leave a reply



Submit