How to automate multiple requests to a web search form using R
Adding to the suggestion by daroczig and Rguy, here is a short piece of code to automate the entire process of extracting the data into a data frame.
# construct sample data frame with lpn, vpn and years
lpn = rep(c('5MXH018', '4TOL562', '5CWR968'), 2);
vpn = rep(c('30135', '74735', '11802'), 2);
year = c(rep(2009, 3), rep(2010, 3));
mydf = data.frame(lpn, vpn, year);
# construct function to extract data for one record
get_data = function(df){
library(XML);
# root url
root = 'http://www.dmv.ca.gov/wasapp/FeeCalculatorWeb/vlfFees.do?method=calculateVlf&su%C2%ADbmit=Determine%20VLF'
# construct url by adding lpn, year and vpn
u = paste(root, '&vehicleLicense=', df$lpn, '&vehicleTaxYear=',
df$year, '&vehicleVin=',
df$vpn, sep = "");
# encode url correctly
url = URLencode(u);
# extract data from the right table
data = readHTMLTable(url)[[5]];
}
# apply function to every row of mydf and return data frame of results
library(plyr)
mydata = adply(mydf, 1, get_data);
# remove junk from column names
names(mydata) = gsub(':\302\240\302\240', '', names(mydata))
Automate multiple requests to a web search form using R
You can use httr to make the call and specify encode = "json"
to automatically set the header Content-Type: application/json
like in curl and encode the body
parameter as a JSON object :
library(httr)
url <- 'https://swisstaxcalculator.estv.admin.ch/delegate/ost-integration/v1/lg-proxy/operation/c3b67379_ESTV/API_calculateSimpleTaxes'
r <- POST(url, body = list(
SimKey = NULL,
TaxYear = 2019,
TaxLocationID = 100000000,
Relationship = 1,
Confession1 = 5,
Children = array(),
Confession2 = 0,
TaxableIncomeCanton = 30000,
TaxableIncomeFed = 30000,
TaxableFortune = 0
), encode = "json", verbose())
data = content(r, "parsed")
print(str(data))
which gives :
{"response":{"IncomeSimpleTaxCanton":1747,"FortuneTaxCanton":0,"IncomeSimpleTaxCity":1747,"IncomeTaxChurch":0,"IncomeTaxCity":1380,"IncomeSimpleTaxFed":119,"PersonalTax":0,"FortuneTaxCity":0,"FortuneSimpleTaxCanton":0,"IncomeTaxFed":119,"FortuneSimpleTaxCity":0,"IncomeTaxCanton":2699,"Location":{"TaxLocationID":100000000,"ZipCode":"1000","BfsID":5586,"CantonID":23,"BfsName":"Lausanne","City":"Lausanne","Canton":"VD"},"FortuneTaxChurch":0}}
How to automate multiple requests to a web search form using R (Java function calls / triger)
The information to recreate the calculator is given on the webpage. For example to
calculate the CVD 10 year risk for a male:
cvdRiskmale <- function(age, SBP, treated, smoke, dia, HDL, TC){
eSum <- (log(age)*3.06117 +treated*1.99881*log(SBP) +(1-treated)*1.93303*log(SBP))
eSum <- eSum + (smoke*0.65451 +dia*0.57367 -0.93263*log(HDL) + 1.12370*log(TC) )
1-0.88936^exp(eSum - 23.9802)
}
> cvdRiskmale(35, 125, 0,0, 0, 45, 180)
[1] 0.02638287
> cvdRiskmale(50, 115, 0,1, 1, 45, 180)
[1] 0.2067156
compare with calculator with same options.
A similar function can be defined for females given the regression coefficients listed on the website.
Scrape tables by passing multiple search requests using R
The call needs is to https: and not http:. I also removed the plyr library used just base R:
library(rvest)
fn = rep(c('HARVEY','HARVEY'));
ln = rep(c('BIDWELL','ADELSON'));
mydf = data.frame(fn,ln);
get_data = function(df){
root = 'https://npiregistry.cms.hhs.gov/'
u = paste(root,'registry/search-results-table?','first_name=', df[1], '&last_name=',
df[2], sep = "");
# encode url correctly
url = URLencode(u);
#print(url)
# extract data from the right table
data = read_html(url);
newresult<- html_nodes(data, "table")[1] %>%html_table()
# convert result into a data frame
newresult<-as.data.frame(newresult)
}
mydata = apply(mydf, 1, function(x) { get_data(x)})
#mydata is a list of data frames, do.call creates a single data.frame
finalanswer<-do.call(rbind, mydata)
#finalanswer needs some clean up.
How to perform web scraping dynamically using R
Google searching 'web scraping with R' brought me this tutorial and this tutorial. Both of these seem simple enough that you should be able to accomplish what you need. Also, heed hrbrmstr's warning, and see if you can acquire the data you need with abusing metacrawler's website.
How can I automate searching strings on a website search tool and record the data in R?
A solution without RSelenium
. By the way, the last address that you provided does not exist according to the website.
require(tidyverse)
require(httr2)
df <- tibble(
address = c(
"570 BLOOR ST W TORONTO ON M6G1K1",
"10 STAYNER AVE NORTH YORK ON M6B1N4",
"1200 WOODBINE AVE EAST YORK ON M4C4E3",
"2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3"
)
)
get_ward <- function(query) {
response <- paste0("https://map.toronto.ca/geoservices/rest/search/rankedsearch?searchArea=1&matchType=1&projectionType=1&retRowLimit=10&areaTypeCode1=CITW&areaTypeCode2=WD03&searchString=",
query) %>%
str_replace_all(" ", "%20") %>%
request() %>%
req_perform() %>%
resp_body_json(simplifyVector = T) %>%
.$result %>%
.$bestResult %>%
.$detail %>%
str_extract("(?<=[:]).*") %>%
str_squish()
ifelse(length(response) == 0,
return(NULL),
return(response))
}
df %>%
mutate(ward = map(address, get_ward) %>%
as.character())
# A tibble: 4 x 2
address ward
<chr> <chr>
1 570 BLOOR ST W TORONTO ON M6G1K1 University-Rosedale (11)
2 10 STAYNER AVE NORTH YORK ON M6B1N4 Eglinton-Lawrence (8)
3 1200 WOODBINE AVE EAST YORK ON M4C4E3 Beaches-East York (19)
4 2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3 NULL
What if I want to web scrape with R for a page with parameters?
You can use RHTMLForms
You may need to install it first:
# install.packages("RHTMLForms", repos = "http://www.omegahat.org/R")
or under windows you may need
# install.packages("RHTMLForms", repos = "http://www.omegahat.org/R", type = "source")
require(RHTMLForms)
require(RCurl)
require(XML)
forms = getHTMLFormDescription("http://stoptb.org/countries/tbteam/experts.asp")
fun = createFunction(forms$sExperts)
# find experts with expertise in "Infection control: Engineering Consultant"
results <- fun(Expertise = "Infection control: Engineering Consultant")
tableData <- getNodeSet(htmlParse(results), "//*/table[@class = 'data']")
readHTMLTable(tableData[[1]])
# V1 V2 V3
#1 <NA> <NA>
#2 Name of Expert Country of Residence Email
#3 Girmay, Desalegn Ethiopia deskebede@yahoo.com
#4 IVANCHENKO, VARVARA Estonia v.ivanchenko81@mail.ru
#5 JAUCOT, Alex Belgium alex.jaucot@gmail.com
#6 Mulder, Hans Johannes Henricus Namibia hmulder@iway.na
#7 Walls, Neil Australia neil@nwalls.com
#8 Zuccotti, Thea Italy thea_zuc@yahoo.com
# V4
#1 <NA>
#2 Number of Missions
#3 0
#4 3
#5 0
#6 0
#7 0
#8 1
or create a reader to return a table
returnTable <- function(results){
tableData <- getNodeSet(htmlParse(results), "//*/table[@class = 'data']")
readHTMLTable(tableData[[1]])
}
fun = createFunction(forms$sExperts, reader = returnTable)
fun(CBased = "Bhutan") # find experts based in Bhutan
# V1 V2 V3
#1 <NA> <NA>
#2 Name of Expert Country of Residence Email
#3 Wangchuk, Lungten Bhutan drlungten@health.gov.bt
# V4
#1 <NA>
#2 Number of Missions
#3 2
Web Scraping in R when you have inputs
Using RSelenium
(see here for more infos):
library(RSelenium)
rD <- rsDriver(browser = c("firefox")) #specify browser type you want Selenium to open
remDr <- rD$client
remDr$navigate("https://pro.rarom.ro/istoric_vehicul/dosar_vehicul.aspx") # navigates to webpage
# select first input field
option <- remDr$findElement(using='id', value="inputEmail")
option$highlightElement()
option$clickElement()
option$sendKeysToElement(list("email@email.com"))
# select second input field
option <- remDr$findElement(using='id', value="inputEmail2")
option$highlightElement()
option$clickElement()
option$sendKeysToElement(list("email@email.com"))
# select second input field
option <- remDr$findElement(using='id', value="inputVIN")
option$highlightElement()
option$clickElement()
option$sendKeysToElement(list("123"))
#press key
webElem <- remDr$findElement(using = "id", "trimite")
webElem$highlightElement()
webElem$clickElement()
Related Topics
Calculate Sum of a List of Variables by Group
Fastest Way for Filling-In Missing Dates for Data.Table
How to Find Row Number of a Value in R Code
Change Color of Leaflet Marker
R Not Finding Package Even After Package Installation
Reading Hdf Files into R and Converting Them to Geotiff Rasters
How to Drop Unused Levels from a Data Frame
Adjusting Width of Tables Made with Kable() in Rmarkdown Documents
Remove Empty Elements from List with Character(0)
How Can Put Multiple Plots Side-By-Side in Shiny R
Mgcv: How to Set Number And/Or Locations of Knots for Splines
Ggplot2: Is There a Fix for Jagged, Poor-Quality Text Produced by Geom_Text()
How to Print R Variables in Middle of String
Extracting Coefficient Variable Names from Glmnet into a Data.Frame
Using R's Lm on a Dataframe with a List of Predictors
How to Change Fontface (Bold/Italics) for a Cell in a Kable Table in Rmarkdown