Building RESTful API using R
I have two options for you:
plumber
plumber allows you to create a REST API by decorating your existing R source
code with special comments.
A small example file:
# myfile.R
#* @get /mean
normalMean <- function(samples=10){
data <- rnorm(samples)
mean(data)
}
#* @post /sum
addTwo <- function(a, b){
as.numeric(a) + as.numeric(b)
}
From the R command line:
> library(plumber)
> r <- plumb("myfile.R") # Where 'myfile.R' is the location of the file shown above
> r$run(port=8000)
With this you would get results like this:
$ curl "http://localhost:8000/mean"
[-0.254]
$ curl "http://localhost:8000/mean?samples=10000"
[-0.0038]
Jug
Jug is a small web development framework for R which relies heavily upon
the httpuv package. It’s main focus is to make building APIs for your
code as easy as possible. It is not supposed to be either an
especially performant nor an uber stable web framework. Other tools
(and languages) might be more suited for that. It’s main focus is to
easily allow you to create APIs for your R code. However, the
flexibility of Jug means that, in theory, you could built an extensive
web framework with it.
It very easy to learn and has a nice vignette.
An Hello-World-example:
library(jug)
jug() %>%
get("/", function(req, res, err){
"Hello World!"
}) %>%
simple_error_handler_json() %>%
serve_it()
Calling a REST API in R
It looks like you're improperly configuring config
. I don't see a config=
in your code. The body
is also not encoded correctly.
Also, in the API documentation I don't see anything about grant_type
. It looks like an array of tasks should go there, e.g. something like:
{882394209: {'site': 'ranksonic.com', 'crawl_max_pages': 10}}
Response:
{'results_count': 1, 'results_time': '0.0629 sec.', 'results': {'2308949': {'post_id': 2308949, 'post_site': 'ranksonic.com',
'task_id': 882394209, 'status': 'ok'}}, 'status': 'ok'}
OK, so first off we need set_config
or config=
:
username <- 'Hack-R@stackoverflow.com' # fake email
password <- 'vxnyM9s7FAKESeIO' # fake password
set_config(authenticate(username,password), override = TRUE)
GET("https://api.dataforseo.com/v2/cmn_se")
Response [https://api.dataforseo.com/v2/cmn_se]
Date: 2018-07-08 16:20
Status: 200
Content-Type: application/json
Size: 551 kB
{
"status": "ok",
"results_time": "0.0564 sec.",
"results_count": 2187,
"results": [
{
"se_id": 37,
"se_name": "google.com.af",
"se_country_iso_code": "AF",
"se_country_name": "Afghanistan",
...
GET("https://api.dataforseo.com/v2/cmn_se/$country_iso_code")
Response [https://api.dataforseo.com/v2/cmn_se/$country_iso_code]
Date: 2018-07-08 15:48
Status: 200
Content-Type: application/json
Size: 100 B
{
"status": "ok",
"results_time": "0.0375 sec.",
"results_count": 0,
"results": []
GET("https://api.dataforseo.com/v2/cmn_se/$op_tasks_post")
Response [https://api.dataforseo.com/v2/cmn_se/$op_tasks_post]
Date: 2018-07-08 16:10
Status: 200
Content-Type: application/json
Size: 100 B
{
"status": "ok",
"results_time": "0.0475 sec.",
"results_count": 0,
"results": []
That was one thing. Also to POST
data they need you to specify it as json
, e.g. encode = "json"
. From their docs:
All POST data should be sent in the JSON format (UTF-8 encoding). The
keywords are sent by POST method passing tasks array. The data should
be specified in the data field of this POST array. We recommend to
send up to 100 tasks at a time.
Further:
The task setting is done using POST method when array of tasks is sent to
the data field. Each of the array elements has the following
structure:
then it goes on to list 2 required fields and many optional ones.
Note also that you can use reset_config()
after as a better practice. If you're going to be running this a lot, sharing it, or using more than 1 computer I would also suggest to put your credentials in environment variables instead of your script for security and ease.
Another final word of advice is that you may want to just leverage their published Python client library and large compilation of examples. Since every new API request is something you'll be pioneering in R without their support, it may pay off to just do the data collection in Python.
This is an interesting API. If you get over to the Open Data Stack Exchange you should consider sharing it with that community.
Consume a REST API with R using a Token
Depending on the API configuration, I think you'll add it in where there's the curly brackets for {identifier}
in the URL.
req_token <- THE TOKEN I HAVE RECIEVED ALREADY
url <- paste('https://myService.com/web/api/datasources/', req_token, '/data', sep='')
That's how some API's do it. Which means your headers might not look like this any more.
mydata <- GET(url, config = add_headers(paste0("Basic ", req_token)))
They probably just won't be there any more. So like :
mydata <- GET(url)
If the token is required in the headers it might look more like this:
mydata <- GET(url, config = add_headers("Basic " = req_token))
But I doubt the token will be both in the URL and in the header. You'll have to find out which is required from the docs.
Edit
I believe your headers should look like this:
mydata <- GET(url, config = add_headers("Authorization " = paste( "Basic", req_token, sep = ' ' ))
Passing many values to an API using R
you have to add the "cleaning" steps and return a df
inside your getNPI
function, then you can later use do.call
for "combine" all data into a "final" data frame:
Example
getNPI <- function(object) {
request <- httr::GET(url = path,
query = list(version = "2.0",
number = object))
df <- content(request, as = "text", encoding = "UTF-8") %>%
jsonlite::fromJSON(. , flatten = TRUE) %>%
data.frame()
df %>%
select(results.number,
results.basic.name,
results.basic.gender,
results.basic.credential,
results.taxonomies) %>%
unnest_wider(results.taxonomies)
# Add more selection, mutations as needed
}
test <- lapply(providerIDs, getNPI)
# Use do.call for rbind an make the final df
final_df <- do.call("rbind",test)
Hope this can help you
NOTE: In order to rbind
works with do.call
as expected, all the columns names has to be the same.
Keep R Model in memory for Rest API
Continuing on my comment from above as well as the suggestion of @TenniStats, the best approach is to reduce the size of the GLM. Consider the following:
#generating some sample data that's fairly large
sample.data <- data.frame('target' = sample(c(1:10), size = 5000000, replace = T),
'regressor1' = rnorm(5000000),
'regressor2' = rnorm(5000000),
'regressor3' = rnorm(5000000),
'regressor4' = rnorm(5000000),
'regressor5' = rnorm(5000000),
'regressor6' = rnorm(5000000),
'regressor7' = rnorm(5000000),
'regressor8' = rnorm(5000000),
'regressor9' = rnorm(5000000),
'regressor10' = rnorm(5000000))
#building a toy glm - this one is about 3.3 GB
lm.mod <- glm(sample.data, formula = target ~ ., family = gaussian)
#baseline predictions
lm.default.preds <- predict(lm.mod, sample.data)
#extracting coefficients
lm.co <- coefficients(lm.mod)
#applying coefficients to original data set by row and adding intercept
lightweight.preds <- lm.co[1] +
apply(sample.data[,2:ncol(sample.data)],
1,
FUN = function(x) sum(x * lm.co[2:length(lm.co)]))
#clearing names from vector for comparison
names(lm.default.preds) <- NULL
#taa daa
all.equal(lm.default.preds, lightweight.preds)
Then we can do the following:
#saving for our example and starting timing
saveRDS(lm.co, file = 'myfile.RDS')
start.time <- Sys.time()
#reading from file
coefs.from.file <- readRDS('myfile.RDS')
#scoring function
light.scoring <- function(coeff, new.data) {
prediction <- coeff[1] + sum(coeff[2:length(coeff)] * new.data)
names(prediction) <- NULL
return(prediction)
}
#same as before
light.scoring(coefs.from.file, sample.data[1, 2:11])
#~.03 seconds on my machine
Sys.time() - start.time
Related Topics
Running Multiple Linear Regressions Across Several Columns of a Data Frame in R
Subset Dataframe Such That All Values in Each Row Are Less Than a Certain Value
What Are Some Good Books, Web Resources, and Projects for Learning R
Assign Headers Based on Existing Row in Dataframe in R
Any Way to Pause at Specific Frames/Time Points with Transition_Reveal in Gganimate
Updating Column in One Dataframe with Value from Another Dataframe Based on Matching Values
Change Facet Label Text and Background Colour
Scale_Color_Manual Colors Won't Change
R: Arranging Multiple Plots Together Using Gridextra
How to Apply Function Over Each Matrix Element's Indices
Draw a Chronological Timeline with Ggplot2
Ggally::Ggpairs Plot Without Gridlines When Plotting Correlation Coefficient