Using rvest to scrape a website w/ a login page
Nvm, got it to work by using url <- jump_to(session, "https://premium.usnews.com/best-graduate-schools/top-medical-schools/research-rankings")
Web-Scraping with Login and Redirect using R and rvest/httr
library(rvest)
url<-"https://kickbase.sky.de/"
page<-html_session(url)
page<-rvest:::request_POST(page,url="https://kickbase.sky.de/api/v1/user/login",
body=list("email"="testscrape@gmail.com",
"password"="tester",
"redirect_url"="http://kickbase.sky.de/spielerprofil/nadiem-amiri/1639#"),
encode='json'
)
player_page<-jump_to(page,"https://kickbase.sky.de/api/v1/news?skip=0&player=1639&limit=3")
data<-jsonlite::fromJSON(readBin(player_page$response$content,what="json"))
print(data)
Please note that the website provides an API and that is where you get the datahttps://kickbase.sky.de/api/v1/news?skip=0&player=1639&limit=3
variable data
has all the information needed
Using rvest to scrape specific values from a web page
Here is solution retrieving the table of prices and then performing some data cleaning:
Still requires some additional clean-up but the majority is done.
library(rvest)
library(dplyr)
library(stringr)
url1 <- read_html("https://www.booking.com/hotel/mu/tamassa.html?aid=356980;label=gog235jc-1DCAsonQFCE2hlcml0YWdlLWF3YWxpLWdvbGZIM1gDaJ0BiAEBmAExuAEXyAEM2AED6AEB-AECiAIBqAIDuAKiwqmEBsACAdICJGFkMTQ3OGU4LTUwZDMtNGQ5ZS1hYzAxLTc0OTIyYTRiZDIxM9gCBOACAQ;sid=729aafddc363c28a2c2c7379d7685d87;all_sr_blocks=36363601_246990918_2_85_0;checkin=2021-09-04;checkout=2021-09-05;dest_id=-1354779;dest_type=city;dist=0;from_beach_key_ufi_sr=1;group_adults=2;group_children=0;hapos=1;highlighted_blocks=36363601_246990918_2_85_0;hp_group_set=0;hpos=1;no_rooms=1;room1=A%2CA;sb_price_type=total;sr_order=popularity;sr_pri_blocks=36363601_246990918_2_85_0__29200;srepoch=1619681695;srpvid=51c8354f03be0097;type=total;ucfs=1&")
output <- url1 %>%
html_nodes(xpath = './/table[@id="hprt-table"]') %>%
html_table() %>% .[[1]]
#Fix column name
colnames(output)[5] <- "Quantity"
#Clean up columns
#remove repeating information in 2 columns
output2 <- output %>% mutate_at(c("Accommodation Type", "Today's price"), ~str_extract(., ".*\n"))
#Remove repeating newlines
answer<-output2 %>% mutate_all(str_squish)
answer
# A tibble: 8 x 5
`Accommodation Ty… Sleeps `Today's price` `Your choices` Quantity
<chr> <chr> <chr> <chr> <chr>
1 Triple Room Max persons: 3 US$398 All-Inclusive FREE cancellation before 23:59 on 27 August 2021 More details on … Select rooms 0 1 (US$398) 2 (US$795) 3 (US$1,193) 4 (US$…
2 Triple Room Max persons: 1 … US$313 All-Inclusive FREE cancellation before 23:59 on 27 August 2021 More details on … Select rooms 0 1 (US$313) 2 (US$626) 3 (US$939) 4 (US$1,…
3 Standard Queen Ro… Max persons: 2 US$325 All-Inclusive FREE cancellation before 23:59 on 27 August 2021 More details on … Select rooms 0 1 (US$325) 2 (US$650) 3 (US$976) 4 (US$1,…
4 Standard Queen Ro… Max persons: 1 … US$241 All-Inclusive FREE cancellation before 23:59 on 27 August 2021 More details on … Select rooms 0 1 (US$241) 2 (US$481) 3 (US$722) 4 (US$96…
5 Superior Queen Ro… Max persons: 2 US$354 All-Inclusive FREE cancellation before 23:59 on 27 August 2021 More details on … Select rooms 0 1 (US$354) 2 (US$708) 3 (US$1,063) 4 (US$…
6 Superior Queen Ro… Max persons: 1 … US$270 All-Inclusive FREE cancellation before 23:59 on 27 August 2021 More details on … Select rooms 0 1 (US$270) 2 (US$539) 3 (US$809) 4 (US$1,…
7 Deluxe Family Room Max persons: 2 US$532 All-Inclusive FREE cancellation before 23:59 on 27 August 2021 More details on … Select rooms 0 1 (US$532) 2 (US$1,064) 3 (US$1,596) 4 (U…
8 Deluxe Family Room Max persons: 1 … US$447 All-Inclusive FREE cancellation before 23:59 on 27 August 2021 More details on … Select rooms 0 1 (US$447) 2 (US$895) 3 (US$1,342) 4 (US$…
Using rvest or httr to log in to non-standard forms on a webpage
Your rvest
code isn't storing the modified form, so in you're example you're just submitting the original pgform
without the values being filled out. Try:
library(rvest)
url <-"http://www.perfectgame.org/" ## page to spider
pgsession <-html_session(url) ## create session
pgform <-html_form(pgsession)[[1]] ## pull form from session
# Note the new variable assignment
filled_form <- set_values(pgform,
`ctl00$Header2$HeaderTop1$tbUsername` = "myemail@gmail.com",
`ctl00$Header2$HeaderTop1$tbPassword` = "mypassword")
submit_form(pgsession,filled_form)
And I now see a nice 200 status code response instead of an error. Note that because the desired submit button appears to be the first submit button, we don't need to give it as an argument, but otherwise we'd just be giving it a a string (straight quotes, not back quotes).
Related Topics
Produce a Table Spanning Multiple Pages Using Kable()
Understanding Ddply Error Message - Argument "By" Is Missing, with No Default
Efficient Multiplication of Columns in a Data Frame
Subtract Pairs of Columns Based on Matching Column
How to Add Annotation on Each Facet
Predict() with Arbitrary Coefficients in R
Replace Rbind in For-Loop with Lapply? (2Nd Circle of Hell)
Adding Percentages to a Grouped Barchart Columns in Ggplot2
Getting the Error "Level Sets of Factors Are Different" When Running a for Loop
Assign Colors to a Range of Values
Str_Extract_All: Return All Patterns Found in String Concatenated as Vector
Resetting Cumsum If Value Goes to Negative in R
Use Lapply for Multiple Regression with Formula Changing, Not the Dataset