How to Automate Multiple Requests to a Web Search Form Using R

How to automate multiple requests to a web search form using R

Adding to the suggestion by daroczig and Rguy, here is a short piece of code to automate the entire process of extracting the data into a data frame.

# construct sample data frame with lpn, vpn and years
lpn = rep(c('5MXH018', '4TOL562', '5CWR968'), 2);
vpn = rep(c('30135', '74735', '11802'), 2);
year = c(rep(2009, 3), rep(2010, 3));
mydf = data.frame(lpn, vpn, year);

# construct function to extract data for one record
get_data = function(df){

library(XML);
# root url
root = 'http://www.dmv.ca.gov/wasapp/FeeCalculatorWeb/vlfFees.do?method=calculateVlf&su%C2%ADbmit=Determine%20VLF'

# construct url by adding lpn, year and vpn
u = paste(root, '&vehicleLicense=', df$lpn, '&vehicleTaxYear=',
df$year, '&vehicleVin=',
df$vpn, sep = "");

# encode url correctly
url = URLencode(u);

# extract data from the right table
data = readHTMLTable(url)[[5]];

}

# apply function to every row of mydf and return data frame of results
library(plyr)
mydata = adply(mydf, 1, get_data);

# remove junk from column names
names(mydata) = gsub(':\302\240\302\240', '', names(mydata))

Automate multiple requests to a web search form using R

You can use httr to make the call and specify encode = "json" to automatically set the header Content-Type: application/json like in curl and encode the body parameter as a JSON object :

library(httr)

url <- 'https://swisstaxcalculator.estv.admin.ch/delegate/ost-integration/v1/lg-proxy/operation/c3b67379_ESTV/API_calculateSimpleTaxes'

r <- POST(url, body = list(
SimKey = NULL,
TaxYear = 2019,
TaxLocationID = 100000000,
Relationship = 1,
Confession1 = 5,
Children = array(),
Confession2 = 0,
TaxableIncomeCanton = 30000,
TaxableIncomeFed = 30000,
TaxableFortune = 0
), encode = "json", verbose())

data = content(r, "parsed")
print(str(data))

which gives :

{"response":{"IncomeSimpleTaxCanton":1747,"FortuneTaxCanton":0,"IncomeSimpleTaxCity":1747,"IncomeTaxChurch":0,"IncomeTaxCity":1380,"IncomeSimpleTaxFed":119,"PersonalTax":0,"FortuneTaxCity":0,"FortuneSimpleTaxCanton":0,"IncomeTaxFed":119,"FortuneSimpleTaxCity":0,"IncomeTaxCanton":2699,"Location":{"TaxLocationID":100000000,"ZipCode":"1000","BfsID":5586,"CantonID":23,"BfsName":"Lausanne","City":"Lausanne","Canton":"VD"},"FortuneTaxChurch":0}}

How to automate multiple requests to a web search form using R (Java function calls / triger)

The information to recreate the calculator is given on the webpage. For example to
calculate the CVD 10 year risk for a male:

cvdRiskmale <- function(age, SBP, treated, smoke, dia, HDL, TC){
eSum <- (log(age)*3.06117 +treated*1.99881*log(SBP) +(1-treated)*1.93303*log(SBP))
eSum <- eSum + (smoke*0.65451 +dia*0.57367 -0.93263*log(HDL) + 1.12370*log(TC) )
1-0.88936^exp(eSum - 23.9802)
}

> cvdRiskmale(35, 125, 0,0, 0, 45, 180)
[1] 0.02638287

> cvdRiskmale(50, 115, 0,1, 1, 45, 180)
[1] 0.2067156

compare with calculator with same options.

A similar function can be defined for females given the regression coefficients listed on the website.

Scrape tables by passing multiple search requests using R

The call needs is to https: and not http:. I also removed the plyr library used just base R:

library(rvest)
fn = rep(c('HARVEY','HARVEY'));
ln = rep(c('BIDWELL','ADELSON'));
mydf = data.frame(fn,ln);

get_data = function(df){
root = 'https://npiregistry.cms.hhs.gov/'
u = paste(root,'registry/search-results-table?','first_name=', df[1], '&last_name=',
df[2], sep = "");
# encode url correctly
url = URLencode(u);
#print(url)
# extract data from the right table
data = read_html(url);
newresult<- html_nodes(data, "table")[1] %>%html_table()
# convert result into a data frame
newresult<-as.data.frame(newresult)
}

mydata = apply(mydf, 1, function(x) { get_data(x)})
#mydata is a list of data frames, do.call creates a single data.frame
finalanswer<-do.call(rbind, mydata)
#finalanswer needs some clean up.

How to perform web scraping dynamically using R

Google searching 'web scraping with R' brought me this tutorial and this tutorial. Both of these seem simple enough that you should be able to accomplish what you need. Also, heed hrbrmstr's warning, and see if you can acquire the data you need with abusing metacrawler's website.

How can I automate searching strings on a website search tool and record the data in R?

A solution without RSelenium. By the way, the last address that you provided does not exist according to the website.

require(tidyverse)
require(httr2)

df <- tibble(
address = c(
"570 BLOOR ST W TORONTO ON M6G1K1",
"10 STAYNER AVE NORTH YORK ON M6B1N4",
"1200 WOODBINE AVE EAST YORK ON M4C4E3",
"2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3"
)
)

get_ward <- function(query) {
response <- paste0("https://map.toronto.ca/geoservices/rest/search/rankedsearch?searchArea=1&matchType=1&projectionType=1&retRowLimit=10&areaTypeCode1=CITW&areaTypeCode2=WD03&searchString=",
query) %>%
str_replace_all(" ", "%20") %>%
request() %>%
req_perform() %>%
resp_body_json(simplifyVector = T) %>%
.$result %>%
.$bestResult %>%
.$detail %>%
str_extract("(?<=[:]).*") %>%
str_squish()

ifelse(length(response) == 0,
return(NULL),
return(response))

}

df %>%
mutate(ward = map(address, get_ward) %>%
as.character())

# A tibble: 4 x 2
address ward
<chr> <chr>
1 570 BLOOR ST W TORONTO ON M6G1K1 University-Rosedale (11)
2 10 STAYNER AVE NORTH YORK ON M6B1N4 Eglinton-Lawrence (8)
3 1200 WOODBINE AVE EAST YORK ON M4C4E3 Beaches-East York (19)
4 2480-2490 GERRARD STREET EAST UNIT 20A TORONTO ON M1N 4C3 NULL

What if I want to web scrape with R for a page with parameters?

You can use RHTMLForms

You may need to install it first:

# install.packages("RHTMLForms", repos = "http://www.omegahat.org/R")

or under windows you may need

# install.packages("RHTMLForms", repos = "http://www.omegahat.org/R", type = "source")

require(RHTMLForms)
require(RCurl)
require(XML)
forms = getHTMLFormDescription("http://stoptb.org/countries/tbteam/experts.asp")
fun = createFunction(forms$sExperts)
# find experts with expertise in "Infection control: Engineering Consultant"
results <- fun(Expertise = "Infection control: Engineering Consultant")

tableData <- getNodeSet(htmlParse(results), "//*/table[@class = 'data']")
readHTMLTable(tableData[[1]])

# V1 V2 V3
#1 <NA> <NA>
#2 Name of Expert Country of Residence Email
#3 Girmay, Desalegn Ethiopia deskebede@yahoo.com
#4 IVANCHENKO, VARVARA Estonia v.ivanchenko81@mail.ru
#5 JAUCOT, Alex Belgium alex.jaucot@gmail.com
#6 Mulder, Hans Johannes Henricus Namibia hmulder@iway.na
#7 Walls, Neil Australia neil@nwalls.com
#8 Zuccotti, Thea Italy thea_zuc@yahoo.com
# V4
#1 <NA>
#2 Number of Missions
#3 0
#4 3
#5 0
#6 0
#7 0
#8 1

or create a reader to return a table

 returnTable <- function(results){
tableData <- getNodeSet(htmlParse(results), "//*/table[@class = 'data']")
readHTMLTable(tableData[[1]])
}
fun = createFunction(forms$sExperts, reader = returnTable)
fun(CBased = "Bhutan") # find experts based in Bhutan
# V1 V2 V3
#1 <NA> <NA>
#2 Name of Expert Country of Residence Email
#3 Wangchuk, Lungten Bhutan drlungten@health.gov.bt
# V4
#1 <NA>
#2 Number of Missions
#3 2

Web Scraping in R when you have inputs

Using RSelenium (see here for more infos):

library(RSelenium)
rD <- rsDriver(browser = c("firefox")) #specify browser type you want Selenium to open
remDr <- rD$client

remDr$navigate("https://pro.rarom.ro/istoric_vehicul/dosar_vehicul.aspx") # navigates to webpage

# select first input field
option <- remDr$findElement(using='id', value="inputEmail")
option$highlightElement()
option$clickElement()
option$sendKeysToElement(list("email@email.com"))

# select second input field
option <- remDr$findElement(using='id', value="inputEmail2")
option$highlightElement()
option$clickElement()
option$sendKeysToElement(list("email@email.com"))

# select second input field
option <- remDr$findElement(using='id', value="inputVIN")
option$highlightElement()
option$clickElement()
option$sendKeysToElement(list("123"))

#press key
webElem <- remDr$findElement(using = "id", "trimite")
webElem$highlightElement()
webElem$clickElement()



Related Topics



Leave a reply



Submit