Apply a Function to Every Row of a Matrix or a Data Frame

Apply a function to every row of a matrix or a data frame

You simply use the apply() function:

R> M <- matrix(1:6, nrow=3, byrow=TRUE)
R> M
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
R> apply(M, 1, function(x) 2*x[1]+x[2])
[1] 4 10 16
R>

This takes a matrix and applies a (silly) function to each row. You pass extra arguments to the function as fourth, fifth, ... arguments to apply().

Call apply-like function on each row of dataframe with multiple arguments from each row

You can apply apply to a subset of the original data.

 dat <- data.frame(x=c(1,2), y=c(3,4), z=c(5,6))
apply(dat[,c('x','z')], 1, function(x) sum(x) )

or if your function is just sum use the vectorized version:

rowSums(dat[,c('x','z')])
[1] 6 8

If you want to use testFunc

 testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(x) testFunc(x[1],x[2]))

EDIT To access columns by name and not index you can do something like this:

 testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(y) testFunc(y['z'],y['x']))

R apply() custom function to every row in data frame

Another approach is modifying your existing function such that it is vectorised.

    t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)
{
if(!equal.variance)
{
se <- sqrt( (s1^2/n1) + (s2^2/n2) )
# welch-satterthwaite df
df <- ( (s1^2/n1 + s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1) )
} else
{
# pooled standard deviation, scaled by the sample sizes
se <- sqrt( (1/n1 + 1/n2) * ((n1-1)*s1^2 + (n2-1)*s2^2)/(n1+n2-2) )
df <- n1+n2-2
}
t <- (m1-m2-m0)/se
dat <- vapply(seq_len(length(m1)),
function(x){c(m1[x]-m2[x], se[x], t[x], 2*pt(-abs(t[x]),df[x]))},
numeric(4)) #one tailed m2 > m1. Replace with "2*pt(-abs(t),df))" for two tailed.
dat <- t(dat)
dat <- as.data.frame(dat)
names(dat) <- c("Difference of means", "Std Error", "t", "p-value")
return(dat)
}

This approach allows you to pass in vectors for your various inputs and it will provide a data frame of equal length to your inputs. It uses the vapply function to return a vector of length 4 for each value provided.

Under this approach, you can simply go

t.test2(MPAmeans$reference_mean, MPAmeans$MPA_mean, MPAmeans$sd_reference, MPAmeans$sd_MPA, MPAmeans$n_reference, MPAmeans$n_MPA)

(or whatever you end up calling your variables)

How can I apply a function to every row of a data frame in R when the function requires multiple inputs?

If we are applying the function on each row, use apply. Also, instead of specifying the row elements one by one as arguments (as it can differ for each dataset), use the ... which can take any number of elements a arguments, and create the matrix out of it

chisquare.tableMod <- function(...){

t <- matrix(c(...), nrow = 2)
chisq.test(t)

}

out <- apply(df1, 1, chisquare.tableMod)

Testing with the output from OP's function

chisquare.table <- function(var1, var2, var3, var4){
t <- matrix(c(var1, var2, var3, var4), nrow = 2)


chisq.test(t)
}

outOld <- chisquare.table(80, 99920, 85, 99915)
identical(out[[1]], outOld)
#[1] TRUE

As @42- mentioned in the comments, apply returns a matrix and matrix can hold only single class. So, select only those columns that are numeric while working with apply (or only single class)

data

df1 <- data.frame(v1 = c(80, 79, 49), v2 = c(99920, 98230, 43240),
v3 = c(85, 40, 35), v4 = c(99915, 43265, 43238))

Apply a function to each row in a data frame in R

You want apply (see the docs for it). apply(var,1,fun) will apply to rows, apply(var,2,fun) will apply to columns.

> apply(a,1,min)
[1] 1 0 3

How to apply a function on every row of a data frame?

For starts, since you say that RS is the same between the two, that to me sounds the caution of "how certain are we that the rows always line up correctly?" To be defensive, I'll say "not 100%", and join/merge them together so that they are guaranteed in the right order.

quux <- tt[df, on="RS"]
quux
# RS G E B wg we wb
# 1: rs2089177 0.9986 0.7153 0.604716 40.6325 35.39774 580.6436
# 2: rs4360974 0.9738 0.7838 0.430228 40.6325 35.39774 580.6436
# 3: rs6502526 0.9744 0.7839 0.429160 40.6325 35.39774 580.6436
# 4: rs8069906 0.7184 0.4918 0.521452 40.6325 35.39774 580.6436
# 5: rs9905280 0.7205 0.4861 0.465758 40.6325 35.39774 580.6436
# 6: rs4313843 0.9804 0.8522 0.474313 40.6325 35.39774 580.6436

From here, it's just applying each portion of the row with the other portion of the same row, for each row:

quux$META <- sapply(seq_len(nrow(quux)), function(rn) {
unlist(sumz(as.matrix(quux[,.(G,E,B)])[rn,], weights = as.vector(quux[,.(wg,we,wb)])[rn,],
na.action=na.fail)["p"])
})
quux
# RS G E B wg we wb META
# 1: rs2089177 0.9986 0.7153 0.604716 40.6325 35.39774 580.6436 0.9863582
# 2: rs4360974 0.9738 0.7838 0.430228 40.6325 35.39774 580.6436 0.9294546
# 3: rs6502526 0.9744 0.7839 0.429160 40.6325 35.39774 580.6436 0.9300445
# 4: rs8069906 0.7184 0.4918 0.521452 40.6325 35.39774 580.6436 0.6379392
# 5: rs9905280 0.7205 0.4861 0.465758 40.6325 35.39774 580.6436 0.6055061
# 6: rs4313843 0.9804 0.8522 0.474313 40.6325 35.39774 580.6436 0.9605584

Or a more data.table-centric way:

mysumz <- function(x, w) sumz(unlist(x), weights = unlist(w), na.action = na.fail)[["p"]]
quux[, META := mysumz(.(G,E,B), .(wg,we,wb)), by = seq_len(nrow(quux))]

(borrowing from https://stackoverflow.com/a/36802640). The secondary function is required because each call to mysumz has a list for each of x and w, but sumz needs vectors. If you want to verify this, first call debugonce(mysumz) then run the quux[,META:=...] and check out x and w ... and how it works.

Using the apply function on rows of a data.frame in R

The issue is that the string columns are factor class because while constructing the data.frame, the default option is stringsAsFactors = TRUE and the factor would get coerced to integer storage mode when we do paste across columns. To avoid this behavior use

df <- data.frame(x=c(1,2,3),y=c('b','a','c'), stringsAsFactors = FALSE)

paste(df[1,],collapse=":")
#[1] "1:b"

With apply, it converts to matrix and matrix can have only a single class, therefore it converts the numeric to 'character' when there is a character element based on the precedence of class

Apply a function returning a data frame to each row in a data frame

You haven't shown what you have in f but based on comments it is written for dataframes, so this should work :

lapply(split(d, seq_len(nrow(d))), f)

split divides every row of d in 1 row-dataframe and using lapply we apply function f on each row.

You can also use by :

by(d, seq_len(nrow(d)), f)

Applying a function to every row on each n number of columns in R

Here is one approach:

Let d be your 3 rows x 2000 columns frame, with column names as.character(1:2000) (See below for generation of fake data). We add a row identifier using .I, then melt the data long, adding grp, and column-group identifier (i.e. identifying the 20 sets of 100). Then apply your function myfunc (see below for stand-in function for this example), by row and group, and swing wide. (I used stringr::str_pad to add 0 to the front of the group number)

# add row identifier
d[, row:=.I]

# melt and add col group identifier
dm = melt(d,id.vars = "row",variable.factor = F)[,variable:=as.numeric(variable)][order(variable,row), grp:=rep(1:20, each=300)]

# get the result (180 rows long), applying myfync to each set of columns, by row
result = dm[, myfunc(value), by=.(row,grp)][,frow:=rep(1:3,times=60)]

# swing wide (3 rows long, 60 columns wide)
dcast(
result[,v:=paste0("grp",stringr::str_pad(grp,2,pad = "0"),"_",row)],
frow~v,value.var="V1"
)[, frow:=NULL][]

Output: (first six columns only)

      grp01_1    grp01_2    grp01_3    grp02_1    grp02_2    grp02_3
<num> <num> <num> <num> <num> <num>
1: 0.54187168 0.47650694 0.48045694 0.51278399 0.51777319 0.46607845
2: 0.06671367 0.08763655 0.08076939 0.07930063 0.09830116 0.07807937
3: 0.25828989 0.29603471 0.28419957 0.28160367 0.31353016 0.27942687

Input:

d = data.table()
alloc.col(d,2000)
set.seed(123)
for(c in 1:2000) set(d,j=as.character(c), value=runif(3))

myfunc Function (toy example for this answer):

myfunc <- function(x) c(mean(x), var(x), sd(x))

Applying function with every row of matrix as the input in r

apply will do it for you readily. Anyways, you need to remove the first (non-time) column from your dataset, along with the first row (if I understand correctly that it contains only the time index)

data <- structure(list(Time.course = c("YORF", "YAL026C"), 
timecourse1 = c(0, 1),
X = c(5, 0.7030321),
X.1 = c(10, NA),
X.2 = c(15, NA),
X.3 = c(20, NA),
X.4 = c(30, 0.7130882),
X.5 = c(40, 0.3322182),
X.6 = c(50, 0.2153255),
X.7 = c(60, 0.2264951)),
row.names = c(NA, -2L), class = c("data.frame"))

time <- as.numeric(data[1, -1])
half_life <- apply(data[-1,-1], 1, function(x) {
PKNCA::pk.calc.half.life(conc = x, time = time)$half.life
})


Related Topics



Leave a reply



Submit