Using rvest or httr to log in to non-standard forms on a webpage
Your rvest
code isn't storing the modified form, so in you're example you're just submitting the original pgform
without the values being filled out. Try:
library(rvest)
url <-"http://www.perfectgame.org/" ## page to spider
pgsession <-html_session(url) ## create session
pgform <-html_form(pgsession)[[1]] ## pull form from session
# Note the new variable assignment
filled_form <- set_values(pgform,
`ctl00$Header2$HeaderTop1$tbUsername` = "myemail@gmail.com",
`ctl00$Header2$HeaderTop1$tbPassword` = "mypassword")
submit_form(pgsession,filled_form)
And I now see a nice 200 status code response instead of an error. Note that because the desired submit button appears to be the first submit button, we don't need to give it as an argument, but otherwise we'd just be giving it a a string (straight quotes, not back quotes).
Web-Scraping with Login and Redirect using R and rvest/httr
library(rvest)
url<-"https://kickbase.sky.de/"
page<-html_session(url)
page<-rvest:::request_POST(page,url="https://kickbase.sky.de/api/v1/user/login",
body=list("email"="testscrape@gmail.com",
"password"="tester",
"redirect_url"="http://kickbase.sky.de/spielerprofil/nadiem-amiri/1639#"),
encode='json'
)
player_page<-jump_to(page,"https://kickbase.sky.de/api/v1/news?skip=0&player=1639&limit=3")
data<-jsonlite::fromJSON(readBin(player_page$response$content,what="json"))
print(data)
Please note that the website provides an API and that is where you get the datahttps://kickbase.sky.de/api/v1/news?skip=0&player=1639&limit=3
variable data
has all the information needed
How to proceed when redirected to page after successful sign in with POST method
SOLVED!
It was a quite easy and intuitive solution, I just needed to submit the form method="post" input type="hidden"
of the redirecting page, i.e. the one encountered in the signed.in
session.
I solved it with rvest
but I think that httr
would be equally easy, here comes the code I used:
library(rvest)
signin.session <- html_session(signin)
signin.form <- html_form(signin.session)[[1]]
filled.signin <- set_values(signin.form,
`user[email]` = user.email,
`user[password]` = user.password)
signed.in <- submit_form(signin.session, filled.signin)
redirect.form <- html_form(signed.in)[[1]]
redirected <- submit_form(signed.in, redirect.form)
This last object redirected
is a session-class object
, basically the page which can be normally browsed after signing in the website.
In case someone has a shorter, more effective, more elegant/sexy/charming solution to proceed...please don't hesitate to share it.
I'm an absolute beginner of web-scraping, and I am keen to learn more about these operations!
THX
403 Error When Using Rvest to Log Into Website For Scraping
Using R.S.'s suggestion, I used RSelenium to log in successfully.
A quick note for fellow mac users on using either chrome or phantom. I am running El Capitan so had some issue getting the mac to recognize the paths to both of the bin files. Instead, I moved the bin files to /usr/local/bin and they ran without an issue.
Below is the code to do so:
library(RSelenium)
RSelenium::startServer()
remDr <- remoteDriver(browserName = "chrome")
remDr$open()
appURL <- 'https://www.optionslam.com/accounts/login/'
remDr$navigate(appURL)
remDr$findElement("id", "id_username")$sendKeysToElement(list("user"))
remDr$findElement("id", "id_password")$sendKeysToElement(list("password", key='enter'))
appURL <- 'https://www.optionslam.com/earnings/stocks/MSFT?page=-1'
remDr$navigate(appURL)
This can also be done with phantom,
library(RSelenium)
pJS <- phantom() # start phantomjs
appURL <- 'https://www.optionslam.com/accounts/login/'
remDr <- remoteDriver(browserName = "phantomjs")
remDr$open()
remDr$navigate(appURL)
remDr$findElement("id", "id_username")$sendKeysToElement(list("user"))
remDr$findElement("id", "id_password")$sendKeysToElement(list("password", key='enter'))
appURL <- 'https://www.optionslam.com/earnings/stocks/MSFT?page=-1'
remDr$navigate(appURL)
Related Topics
Svg Lineargradient Hidden If Svg Is Hidden in Seperate Class
Inverted Scooped Corners Using CSS
How to Play CSS3 Transitions in a Loop
How Are 'Display: Table-Cell' Widths Calculated
How Is The Meta Viewport Tag Used, and What Does It Do
What Happens If The Action Field in a <Form> Has Parameters
How to Get Equal Width of Input and Select Fields
HTML5 File API Downloading File from Server and Saving It in Sandbox
HTML5 Getusermedia Record Webcam, Both Audio and Video
How to Remove The Border Around an Image Without a Source
Flex Elements Ignore Percent Padding in Firefox
Header/Footer/Nav Tags - What Happens to These in Ie7, Ie8 and Browsers Than Don't Support HTML5
Change The Color of Glyphicons to Blue in Some- But Not at All Places Using Bootstrap 2
How to Hide/Show a Div When a Button Is Clicked