Building Restful API Using R

Building RESTful API using R

I have two options for you:

plumber

plumber allows you to create a REST API by decorating your existing R source
code with special comments.

A small example file:

# myfile.R

#* @get /mean
normalMean <- function(samples=10){
data <- rnorm(samples)
mean(data)
}

#* @post /sum
addTwo <- function(a, b){
as.numeric(a) + as.numeric(b)
}

From the R command line:

> library(plumber)
> r <- plumb("myfile.R") # Where 'myfile.R' is the location of the file shown above
> r$run(port=8000)

With this you would get results like this:

$ curl "http://localhost:8000/mean"
[-0.254]
$ curl "http://localhost:8000/mean?samples=10000"
[-0.0038]

Jug

Jug is a small web development framework for R which relies heavily upon
the httpuv package. It’s main focus is to make building APIs for your
code as easy as possible. It is not supposed to be either an
especially performant nor an uber stable web framework. Other tools
(and languages) might be more suited for that. It’s main focus is to
easily allow you to create APIs for your R code. However, the
flexibility of Jug means that, in theory, you could built an extensive
web framework with it.

It very easy to learn and has a nice vignette.

An Hello-World-example:

library(jug)

jug() %>%
get("/", function(req, res, err){
"Hello World!"
}) %>%
simple_error_handler_json() %>%
serve_it()

Calling a REST API in R

It looks like you're improperly configuring config. I don't see a config= in your code. The body is also not encoded correctly.

Also, in the API documentation I don't see anything about grant_type. It looks like an array of tasks should go there, e.g. something like:

   {882394209:  {'site': 'ranksonic.com', 'crawl_max_pages': 10}}

Response:

{'results_count': 1, 'results_time': '0.0629 sec.', 'results': {'2308949': {'post_id': 2308949, 'post_site': 'ranksonic.com',
'task_id': 882394209, 'status': 'ok'}}, 'status': 'ok'}

OK, so first off we need set_config or config=:

username <- 'Hack-R@stackoverflow.com' # fake email
password <- 'vxnyM9s7FAKESeIO' # fake password

set_config(authenticate(username,password), override = TRUE)

GET("https://api.dataforseo.com/v2/cmn_se")
Response [https://api.dataforseo.com/v2/cmn_se]
Date: 2018-07-08 16:20
Status: 200
Content-Type: application/json
Size: 551 kB
{
"status": "ok",
"results_time": "0.0564 sec.",
"results_count": 2187,
"results": [
{
"se_id": 37,
"se_name": "google.com.af",
"se_country_iso_code": "AF",
"se_country_name": "Afghanistan",
...
GET("https://api.dataforseo.com/v2/cmn_se/$country_iso_code")
Response [https://api.dataforseo.com/v2/cmn_se/$country_iso_code]
Date: 2018-07-08 15:48
Status: 200
Content-Type: application/json
Size: 100 B
{
"status": "ok",
"results_time": "0.0375 sec.",
"results_count": 0,
"results": []
GET("https://api.dataforseo.com/v2/cmn_se/$op_tasks_post")
Response [https://api.dataforseo.com/v2/cmn_se/$op_tasks_post]
Date: 2018-07-08 16:10
Status: 200
Content-Type: application/json
Size: 100 B
{
"status": "ok",
"results_time": "0.0475 sec.",
"results_count": 0,
"results": []

That was one thing. Also to POST data they need you to specify it as json, e.g. encode = "json". From their docs:

All POST data should be sent in the JSON format (UTF-8 encoding). The
keywords are sent by POST method passing tasks array. The data should
be specified in the data field of this POST array. We recommend to
send up to 100 tasks at a time.

Further:

The task setting is done using POST method when array of tasks is sent to
the data field. Each of the array elements has the following
structure:

then it goes on to list 2 required fields and many optional ones.

Note also that you can use reset_config() after as a better practice. If you're going to be running this a lot, sharing it, or using more than 1 computer I would also suggest to put your credentials in environment variables instead of your script for security and ease.

Another final word of advice is that you may want to just leverage their published Python client library and large compilation of examples. Since every new API request is something you'll be pioneering in R without their support, it may pay off to just do the data collection in Python.

This is an interesting API. If you get over to the Open Data Stack Exchange you should consider sharing it with that community.

Consume a REST API with R using a Token

Depending on the API configuration, I think you'll add it in where there's the curly brackets for {identifier} in the URL.

    req_token <- THE TOKEN I HAVE RECIEVED ALREADY
url <- paste('https://myService.com/web/api/datasources/', req_token, '/data', sep='')

That's how some API's do it. Which means your headers might not look like this any more.

mydata <- GET(url, config = add_headers(paste0("Basic ", req_token)))

They probably just won't be there any more. So like :

mydata <- GET(url)

If the token is required in the headers it might look more like this:

mydata <- GET(url, config = add_headers("Basic " = req_token))

But I doubt the token will be both in the URL and in the header. You'll have to find out which is required from the docs.

Edit

I believe your headers should look like this:

mydata <- GET(url, config = add_headers("Authorization " = paste( "Basic", req_token, sep = ' ' ))

Passing many values to an API using R

you have to add the "cleaning" steps and return a df inside your getNPI function, then you can later use do.call for "combine" all data into a "final" data frame:

Example

getNPI <- function(object) {
request <- httr::GET(url = path,
query = list(version = "2.0",
number = object))

df <- content(request, as = "text", encoding = "UTF-8") %>%
jsonlite::fromJSON(. , flatten = TRUE) %>%
data.frame()

df %>%
select(results.number,
results.basic.name,
results.basic.gender,
results.basic.credential,
results.taxonomies) %>%
unnest_wider(results.taxonomies)
# Add more selection, mutations as needed
}

test <- lapply(providerIDs, getNPI)

# Use do.call for rbind an make the final df
final_df <- do.call("rbind",test)

Hope this can help you

NOTE: In order to rbind works with do.call as expected, all the columns names has to be the same.

Keep R Model in memory for Rest API

Continuing on my comment from above as well as the suggestion of @TenniStats, the best approach is to reduce the size of the GLM. Consider the following:

#generating some sample data that's fairly large
sample.data <- data.frame('target' = sample(c(1:10), size = 5000000, replace = T),
'regressor1' = rnorm(5000000),
'regressor2' = rnorm(5000000),
'regressor3' = rnorm(5000000),
'regressor4' = rnorm(5000000),
'regressor5' = rnorm(5000000),
'regressor6' = rnorm(5000000),
'regressor7' = rnorm(5000000),
'regressor8' = rnorm(5000000),
'regressor9' = rnorm(5000000),
'regressor10' = rnorm(5000000))

#building a toy glm - this one is about 3.3 GB
lm.mod <- glm(sample.data, formula = target ~ ., family = gaussian)

#baseline predictions
lm.default.preds <- predict(lm.mod, sample.data)

#extracting coefficients
lm.co <- coefficients(lm.mod)

#applying coefficients to original data set by row and adding intercept
lightweight.preds <- lm.co[1] +
apply(sample.data[,2:ncol(sample.data)],
1,
FUN = function(x) sum(x * lm.co[2:length(lm.co)]))

#clearing names from vector for comparison
names(lm.default.preds) <- NULL

#taa daa
all.equal(lm.default.preds, lightweight.preds)

Then we can do the following:

#saving for our example and starting timing
saveRDS(lm.co, file = 'myfile.RDS')
start.time <- Sys.time()

#reading from file
coefs.from.file <- readRDS('myfile.RDS')
#scoring function
light.scoring <- function(coeff, new.data) {
prediction <- coeff[1] + sum(coeff[2:length(coeff)] * new.data)
names(prediction) <- NULL
return(prediction)
}

#same as before
light.scoring(coefs.from.file, sample.data[1, 2:11])
#~.03 seconds on my machine
Sys.time() - start.time


Related Topics



Leave a reply



Submit