How to define a vectorized function in R
A loop at the R level is not vectorized. An R loop will be calling the same R code for each element of a vector, which will be inefficient. Vectorized functions usually refer to those that take a vector and operate on the entire vector in an efficient way. Ultimately this will involve some form of loop, but as that loop is being performed in a low-level language such as C it can be highly efficient and tailored to the particular task.
Consider this silly function to add pairwise the elements of two vectors
sillyplus <- function(x, y) {
out <- numeric(length = length(x))
for(i in seq_along(x)) {
out[i] <- x[i] + y[i]
}
out
}
It gives the right result
R> sillyplus(1:10, 1:10)
[1] 2 4 6 8 10 12 14 16 18 20
and is vectorised in the sense that it can operate on entire vectors at once, but it is not vectorised in the sense I describe above because it is exceptionally inefficient. +
is vectorised at the C level in R so we really only need 1:10 + 1:10
, not an explicit loop in R.
The usual way to write a vectorised function is to use existing R functions that are already vectorised. If you want to start from scratch and the thing you want to do with the function doesn't exist as a vectorised function in R (odd, but possible) then you will need to get your hands dirty and write the guts of the function in C and prepare a little wrapper in R to call the C function you wrote with the vector of data you want it to work on. There are ways with functions like Vectorize()
to fake vectorisation for R functions that are not vectorised.
C is not the only option here, FORTRAN is a possibility as is C++ and, thanks to Dirk Eddelbuettel & Romain Francois, the latter is much easier to do now with the Rcpp
package.
R how to vectorize a function with multiple if else conditions
Here is a vectorized way. It creates logical vectors i1
, i2
, i3
and i4
corresponding to the 4 conditions. Then it assigns the new values to the positions indexed by them.
Trial_func2 <- function(df1){
i1 <- df1[["Obs_Type"]] == 1
i2 <- df1[["Obs_Type"]] == 2
i3 <- df1[["Obs_Type"]] == 3
i4 <- df1[["Obs_Type"]] == 4
#If Type == 1; then a=-Inf, b = Upper_Bound
df1[i1, "draw_value"] <- rtruncnorm(sum(i1), a =-Inf,
b = df1[i1, "Upper_bound"],
mean = df1[i1, "mean"], sd = 1)
#If Type == 2; then a=-10, b = Upper_Bound
df1[i2, "draw_value"] <- rtruncnorm(sum(i2), a = -10,
b = df1[i2 , "Upper_bound"],
mean = df1[i2, "mean"], sd = 1)
#If Type == 3; then a=Lower_bound, b = Inf
df1[i3,"draw_value"] <- rtruncnorm(sum(i3),
a = df1[i3, "Lower_bound"],
b = Inf, mean = df1[i3, "mean"],
sd = 1)
#If Type == 3; then a=Lower_bound, b = 10
df1[i4, "draw_value"] <- rtruncnorm(sum(i4),
a = df1[i4, "Lower_bound"],
b = 10,
mean = df1[i4,"mean"],
sd = 1)
df1
}
In the speed test I have named @Dave2e's answer Trial_func3
.
mbm <- microbenchmark(
loop = Trial_func(df1 = df1),
vect = Trial_func2(df1 = df1),
cwhen = Trial_func3(df1 = df1),
times = 10)
print(mbm, order = "median")
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# vect 4.349444 4.371169 4.40920 4.401384 4.450024 4.487453 10 a
# cwhen 13.458946 13.484247 14.16045 13.528792 13.787951 19.363104 10 a
# loop 2125.665690 2138.792497 2211.20887 2157.185408 2201.391083 2453.658767 10 b
How do I know a function or an operation in R is vectorized?
Vectorization in R basically means that any looping is moved to a faster, compiled language such as C or FORTRAN. For that to occur the vector(s) in question should be "atomic" - i.e. it should be "flat" and homogeneous - and the vector type, which you can check with typeof()
, should make sense for the operation(s) being performed. If it is atomic then it is vectorized.
You can check if a vector is atomic using is.atomic()
. Another type of vector that is not vectorized is called "recursive", which you can check using is.recursive()
. Recursive objects can contain other objects of any type, i.e. they can be heterogeneous. Lists and data frames are recursive.
Try something like the following to gain some insight into atomic vs. recursive:
# Atomic:
1
1:3
c("a", "b", "c")
c(T, F, T)
# Recursive:
list(nums = 1:3, letts = c("a", "b", "c"), logics = c(T, F, T))
data.frame(nums = 1:3, letts = c("a", "b", "c"), logics = c(T, F, T))
# Vectors can be atomic or recursive:
is.vector(1:9) # TRUE
is.atomic(1:9) # TRUE
is.recursive(1:9) # FALSE
is.vector(list(nums = 1:9, chars = "x")) # TRUE
is.atomic(list(1:9)) # FALSE
is.recursive(list(1:9)) # TRUE
# Matrices are atomic, data frames are recursive:
is.vector(matrix(1:9, 3)) # FALSE
is.atomic(matrix(1:9, 3)) # TRUE
is.recursive(matrix(1:9, 3)) # FALSE
is.vector(as.data.frame(matrix(1:9, 3))) # FALSE
is.atomic(as.data.frame(matrix(1:9, 3))) # FALSE
is.recursive(as.data.frame(matrix(1:9, 3))) # TRUE
I think you can assume that many, if not most, of the R functions that you use most frequently are vectorized. I don't think there is any way to check this other than by looking at the documentation or the function internals. Whenever you think about writing a for loop to do simple element-wise operations, think about how to do it using vectorization. With enough practice it will become second nature to you. For more details I can recommend this blog post from Noam Ross.
How to vectorize a function in R
First and foremost - case specific optimization - remove the cases where nFast > nSlow
as it doesn't make sense technically.
Secondly - you are creating objects and copying them over and over again. This is very expensive.
Thirdly - you can code this better perhaps by creating a matrix of signals in one loop and doing rest of the operations in vectorized manner.
I would code what you are doing something like this.
Please read help pages of mapply
, do.call
, merge
and sapply
if you don't understand.
require(quantmod)
getSymbols("LUNA")
#Choose the Adjusted Close of a Symbol
stock <- Ad(LUNA)
# I want to create a table with all possible combinations from the ranges below
i = c(2:50)
k = c(4:50)
j = c(2:50)
# stores possible combinations into z
z <- expand.grid(i,k,j)
IMO : This is where your first optimization should be. Remove cases where i > k
z <- z[z[,1]<z[,2], ]
It reduces the number of cases from 112847
to 57575
#Calculate only once. No need to calculate this in every iteration.
stockret <- ROC(stock)
getStratRet <- function(nFast, nSlow, nSig, stock, stockret) {
x <- MACD((stock), nFast=nFast, nSlow=nSlow, nSig=nSig, maType="EMA")
x <- na.omit(x)
sig <- Lag(ifelse((x$macd <= x$signal),-1, 0)) + Lag(ifelse((x$macd >= x$signal),1, 0))
return(na.omit(stockret * sig))
}
RETURNSLIST <- do.call(merge, mapply(FUN = getStratRet, nFast = z[,1], nSlow = z[,2], nSig = z[,3], MoreArgs = list(stock = stock, stockret = stockret), SIMPLIFY = TRUE))
getAnnualSharpe <- function(ret) {
ret <- na.omit(ret)
return ((mean(ret)/sd(ret)) * sqrt(252))
}
SHARPELIST <- sapply(RETURNSLIST, FUN = getAnnualSharpe)
Results will be as below. Which column belongs to which combo of i
, j
, k
is trivial.
head(RETURNSLIST[, 1:3])
## LUNA.Adjusted LUNA.Adjusted.1 LUNA.Adjusted.2
## 2007-01-10 0.012739026 -0.012739026 0
## 2007-01-11 -0.051959739 0.051959739 0
## 2007-01-12 -0.007968170 -0.007968170 0
## 2007-01-16 -0.007905180 -0.007905180 0
## 2007-01-17 -0.005235614 -0.005235614 0
## 2007-01-18 0.028315920 -0.028315920 0
SHARPELIST
## LUNA.Adjusted LUNA.Adjusted.1 LUNA.Adjusted.2 LUNA.Adjusted.3 LUNA.Adjusted.4 LUNA.Adjusted.5 LUNA.Adjusted.6
## 0.04939150 -0.07428392 NaN 0.02626382 -0.06789803 -0.22584987 -0.07305477
## LUNA.Adjusted.7 LUNA.Adjusted.8 LUNA.Adjusted.9
## -0.05831643 -0.08864845 -0.08221986
system.time(
+ RETURNSLIST <- do.call(merge, mapply(FUN = getStratRet, nFast = z[1:100,1], nSlow = z[1:100,2], nSig = z[1:100,3], MoreArgs = list(stock = stock, stockret = stockret), SIMPLIFY = TRUE)),
+ SHARPELIST <- sapply(RETURNSLIST, FUN = getAnnualSharpe)
+ )
user system elapsed
2.28 0.00 2.29
Vectorized functions in R's data.table
You could use Map
/mapply
:
library(data.table)
dt[, weeks_for_filter_table := mapply(get_weeks, START, END)]
dt
# ID START END weeks_for_filter_table
#1: 1 2020-01-01 2020-01-15 2020 W01,2020 W02,2020 W03
#2: 2 2020-03-01 2020-03-12 2020 W09,2020 W10,2020 W11
#3: 3 2020-03-14 2020-03-26 2020 W11,2020 W12,2020 W13
How do I vectorize this is_prime function in R?
Half Vectorized
It is possible to vectorize some of the function by dealing with even numbers (and a few other numbers) in a vectorized fashion. The rest is taken care of using vapply
.
helper <- function(x) {
for (k in seq(3, round(sqrt(x)) + 1, 2)) {
if (x %% k == 0)
return(FALSE)
}
return(TRUE)
}
is.prime <- function(v) {
out <- rep(TRUE, length(v))
out[v %% 2 == 0 | v %in% c(1)] <- FALSE
out[v %in% c(2, 3, 5)] <- TRUE
indices <- which(v > 5 && v == FALSE)
out[indices] <- vapply(v[indices], helper, logical(1))
return(out)
}
is.prime(c(17,5,10,22,109,55))
# [1] TRUE TRUE FALSE FALSE TRUE FALSE
Full Vectorized
If performance is at stake, you might consider using `Rcpp`:c++ file
#include <Rcpp.h>
#include <math.h>
using namespace Rcpp;
bool is_prime(int n) {
if ((n == 2) || (n == 3) || (n == 5)) {
return true;
}
if ((n % 2 == 0) || (n == 1)) {
return false;
}
int i = 3;
while (i < round(sqrt(n)) + 1) {
if (n % i == 0) {
return false;
}
i += 2;
}
return true;
}
// [[Rcpp::export]]
LogicalVector is_prime(IntegerVector v) {
int n = v.length();
LogicalVector out = LogicalVector(n);
for (int i = 0; i < n; i++) {
out[i] = is_prime(v[i]);
}
return out;
}
R File
library(Rcpp)
sourceCpp('prime_fun.cpp') # if cpp file in same dir
is_prime(c(17,5,10,22,109,55))
# [1] TRUE TRUE FALSE FALSE TRUE FALSE
Vectorized function for dplyr::mutate()
any
would always return only one logical value as output. You should collapse your favorite_cars
regex as length 1 string.
is_favorite <- function(x) {
stringr::str_detect(x, paste0(favorite_cars, collapse = "|"))
#Will also work with base R grepl
#grepl(paste0(favorite_cars, collapse = "|"), x)
}
and then use :
library(dplyr)
mtcars %>% mutate(fav_car = is_favorite(car))
# car mpg cyl disp hp drat wt qsec vs am gear carb fav_car
#1 Mazda RX4 21.0 6 160.0 110 3.90 2.62 16.5 0 1 4 4 FALSE
#2 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.88 17.0 0 1 4 4 FALSE
#3 Datsun 710 22.8 4 108.0 93 3.85 2.32 18.6 1 1 4 1 FALSE
#4 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.21 19.4 1 0 3 1 FALSE
#5 Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.0 0 0 3 2 FALSE
#6 Valiant 18.1 6 225.0 105 2.76 3.46 20.2 1 0 3 1 FALSE
#7 Duster 360 14.3 8 360.0 245 3.21 3.57 15.8 0 0 3 4 FALSE
#8 Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2 TRUE
#9 Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2 TRUE
#10 Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4 TRUE
#11 Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4 TRUE
#...
#...
where the pattern that we are looking for becomes
paste0(favorite_cars, collapse = "|")
#[1] "^Merc|Firebird$"
Related Topics
How to Write from R to the Clipboard on a MAC
Apply Grouped Model Back Onto Data
Install R Packages from Github Downloading Master.Zip
Calculating the Difference Between Consecutive Rows by Group Using Dplyr
Date Time Conversion and Extract Only Time
Multiple Graphs Over Multiple Pages Using Ggplot
Change Internal Function of a Package
R Shiny Error: Cannot Coerce Type 'Closure' to Vector of Type 'Double'
How to Avoid Using Round() in Every \Sexpr{}
How to Check the Existence of a Downloaded File
How to Set Seed for Random Simulations with Foreach and Domc Packages
How to Write a Function That Calls a Function That Calls Data.Table
Plot Logistic Regression Curve in R
How to Set Unique Row and Column Names of a Matrix When Its Dimension Is Unknown