Apply a function to every row of a matrix or a data frame
You simply use the apply()
function:
R> M <- matrix(1:6, nrow=3, byrow=TRUE)
R> M
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
R> apply(M, 1, function(x) 2*x[1]+x[2])
[1] 4 10 16
R>
This takes a matrix and applies a (silly) function to each row. You pass extra arguments to the function as fourth, fifth, ... arguments to apply()
.
Call apply-like function on each row of dataframe with multiple arguments from each row
You can apply apply
to a subset of the original data.
dat <- data.frame(x=c(1,2), y=c(3,4), z=c(5,6))
apply(dat[,c('x','z')], 1, function(x) sum(x) )
or if your function is just sum use the vectorized version:
rowSums(dat[,c('x','z')])
[1] 6 8
If you want to use testFunc
testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(x) testFunc(x[1],x[2]))
EDIT To access columns by name and not index you can do something like this:
testFunc <- function(a, b) a + b
apply(dat[,c('x','z')], 1, function(y) testFunc(y['z'],y['x']))
R apply() custom function to every row in data frame
Another approach is modifying your existing function such that it is vectorised.
t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)
{
if(!equal.variance)
{
se <- sqrt( (s1^2/n1) + (s2^2/n2) )
# welch-satterthwaite df
df <- ( (s1^2/n1 + s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1) )
} else
{
# pooled standard deviation, scaled by the sample sizes
se <- sqrt( (1/n1 + 1/n2) * ((n1-1)*s1^2 + (n2-1)*s2^2)/(n1+n2-2) )
df <- n1+n2-2
}
t <- (m1-m2-m0)/se
dat <- vapply(seq_len(length(m1)),
function(x){c(m1[x]-m2[x], se[x], t[x], 2*pt(-abs(t[x]),df[x]))},
numeric(4)) #one tailed m2 > m1. Replace with "2*pt(-abs(t),df))" for two tailed.
dat <- t(dat)
dat <- as.data.frame(dat)
names(dat) <- c("Difference of means", "Std Error", "t", "p-value")
return(dat)
}
This approach allows you to pass in vectors for your various inputs and it will provide a data frame of equal length to your inputs. It uses the vapply
function to return a vector of length 4 for each value provided.
Under this approach, you can simply go
t.test2(MPAmeans$reference_mean, MPAmeans$MPA_mean, MPAmeans$sd_reference, MPAmeans$sd_MPA, MPAmeans$n_reference, MPAmeans$n_MPA)
(or whatever you end up calling your variables)
How can I apply a function to every row of a data frame in R when the function requires multiple inputs?
If we are applying the function on each row, use apply
. Also, instead of specifying the row elements one by one as arguments (as it can differ for each dataset), use the ...
which can take any number of elements a arguments, and create the matrix
out of it
chisquare.tableMod <- function(...){
t <- matrix(c(...), nrow = 2)
chisq.test(t)
}
out <- apply(df1, 1, chisquare.tableMod)
Testing with the output from OP's function
chisquare.table <- function(var1, var2, var3, var4){
t <- matrix(c(var1, var2, var3, var4), nrow = 2)
chisq.test(t)
}
outOld <- chisquare.table(80, 99920, 85, 99915)
identical(out[[1]], outOld)
#[1] TRUE
As @42- mentioned in the comments, apply
returns a matrix
and matrix can hold only single class. So, select only those columns that are numeric
while working with apply
(or only single class)
data
df1 <- data.frame(v1 = c(80, 79, 49), v2 = c(99920, 98230, 43240),
v3 = c(85, 40, 35), v4 = c(99915, 43265, 43238))
Apply a function to each row in a data frame in R
You want apply
(see the docs for it). apply(var,1,fun)
will apply to rows, apply(var,2,fun)
will apply to columns.
> apply(a,1,min)
[1] 1 0 3
How to apply a function on every row of a data frame?
For starts, since you say that RS
is the same between the two, that to me sounds the caution of "how certain are we that the rows always line up correctly?" To be defensive, I'll say "not 100%", and join/merge them together so that they are guaranteed in the right order.
quux <- tt[df, on="RS"]
quux
# RS G E B wg we wb
# 1: rs2089177 0.9986 0.7153 0.604716 40.6325 35.39774 580.6436
# 2: rs4360974 0.9738 0.7838 0.430228 40.6325 35.39774 580.6436
# 3: rs6502526 0.9744 0.7839 0.429160 40.6325 35.39774 580.6436
# 4: rs8069906 0.7184 0.4918 0.521452 40.6325 35.39774 580.6436
# 5: rs9905280 0.7205 0.4861 0.465758 40.6325 35.39774 580.6436
# 6: rs4313843 0.9804 0.8522 0.474313 40.6325 35.39774 580.6436
From here, it's just applying each portion of the row with the other portion of the same row, for each row:
quux$META <- sapply(seq_len(nrow(quux)), function(rn) {
unlist(sumz(as.matrix(quux[,.(G,E,B)])[rn,], weights = as.vector(quux[,.(wg,we,wb)])[rn,],
na.action=na.fail)["p"])
})
quux
# RS G E B wg we wb META
# 1: rs2089177 0.9986 0.7153 0.604716 40.6325 35.39774 580.6436 0.9863582
# 2: rs4360974 0.9738 0.7838 0.430228 40.6325 35.39774 580.6436 0.9294546
# 3: rs6502526 0.9744 0.7839 0.429160 40.6325 35.39774 580.6436 0.9300445
# 4: rs8069906 0.7184 0.4918 0.521452 40.6325 35.39774 580.6436 0.6379392
# 5: rs9905280 0.7205 0.4861 0.465758 40.6325 35.39774 580.6436 0.6055061
# 6: rs4313843 0.9804 0.8522 0.474313 40.6325 35.39774 580.6436 0.9605584
Or a more data.table
-centric way:
mysumz <- function(x, w) sumz(unlist(x), weights = unlist(w), na.action = na.fail)[["p"]]
quux[, META := mysumz(.(G,E,B), .(wg,we,wb)), by = seq_len(nrow(quux))]
(borrowing from https://stackoverflow.com/a/36802640). The secondary function is required because each call to mysumz
has a list
for each of x
and w
, but sumz
needs vectors. If you want to verify this, first call debugonce(mysumz)
then run the quux[,META:=...]
and check out x
and w
... and how it works.
Using the apply function on rows of a data.frame in R
The issue is that the string columns are factor
class because while constructing the data.frame
, the default option is stringsAsFactors = TRUE
and the factor
would get coerced to integer storage mode when we do paste
across columns. To avoid this behavior use
df <- data.frame(x=c(1,2,3),y=c('b','a','c'), stringsAsFactors = FALSE)
paste(df[1,],collapse=":")
#[1] "1:b"
With apply
, it converts to matrix
and matrix can have only a single class, therefore it converts the numeric to 'character' when there is a character element based on the precedence of class
Apply a function returning a data frame to each row in a data frame
You haven't shown what you have in f
but based on comments it is written for dataframes, so this should work :
lapply(split(d, seq_len(nrow(d))), f)
split
divides every row of d
in 1 row-dataframe and using lapply
we apply function f
on each row.
You can also use by
:
by(d, seq_len(nrow(d)), f)
Applying a function to every row on each n number of columns in R
Here is one approach:
Let d
be your 3 rows x 2000 columns frame, with column names as.character(1:2000)
(See below for generation of fake data). We add a row identifier using .I
, then melt the data long, adding grp
, and column-group identifier (i.e. identifying the 20 sets of 100). Then apply your function myfunc
(see below for stand-in function for this example), by row and group, and swing wide. (I used stringr::str_pad
to add 0 to the front of the group number)
# add row identifier
d[, row:=.I]
# melt and add col group identifier
dm = melt(d,id.vars = "row",variable.factor = F)[,variable:=as.numeric(variable)][order(variable,row), grp:=rep(1:20, each=300)]
# get the result (180 rows long), applying myfync to each set of columns, by row
result = dm[, myfunc(value), by=.(row,grp)][,frow:=rep(1:3,times=60)]
# swing wide (3 rows long, 60 columns wide)
dcast(
result[,v:=paste0("grp",stringr::str_pad(grp,2,pad = "0"),"_",row)],
frow~v,value.var="V1"
)[, frow:=NULL][]
Output: (first six columns only)
grp01_1 grp01_2 grp01_3 grp02_1 grp02_2 grp02_3
<num> <num> <num> <num> <num> <num>
1: 0.54187168 0.47650694 0.48045694 0.51278399 0.51777319 0.46607845
2: 0.06671367 0.08763655 0.08076939 0.07930063 0.09830116 0.07807937
3: 0.25828989 0.29603471 0.28419957 0.28160367 0.31353016 0.27942687
Input:
d = data.table()
alloc.col(d,2000)
set.seed(123)
for(c in 1:2000) set(d,j=as.character(c), value=runif(3))
myfunc
Function (toy example for this answer):
myfunc <- function(x) c(mean(x), var(x), sd(x))
Applying function with every row of matrix as the input in r
apply
will do it for you readily. Anyways, you need to remove the first (non-time) column from your dataset, along with the first row (if I understand correctly that it contains only the time index)
data <- structure(list(Time.course = c("YORF", "YAL026C"),
timecourse1 = c(0, 1),
X = c(5, 0.7030321),
X.1 = c(10, NA),
X.2 = c(15, NA),
X.3 = c(20, NA),
X.4 = c(30, 0.7130882),
X.5 = c(40, 0.3322182),
X.6 = c(50, 0.2153255),
X.7 = c(60, 0.2264951)),
row.names = c(NA, -2L), class = c("data.frame"))
time <- as.numeric(data[1, -1])
half_life <- apply(data[-1,-1], 1, function(x) {
PKNCA::pk.calc.half.life(conc = x, time = time)$half.life
})
Related Topics
Converting Two Columns of a Data Frame to a Named Vector
Apply a Function to Every Row of a Matrix or a Data Frame
Remove Rows in R Matrix Where All Data Is Na
Converting Geo Coordinates from Degree to Decimal
How to Create Grouped Barplot with R
Converting a Data Frame to Xts
Creating Dummy Variables in R Data.Table
Reverse Order of Discrete Y Axis in Ggplot2
What's the Differences Between & and &&, | and || in R
Error ".Onload Failed in Loadnamespace() for 'Tcltk'"
Shiny: Differencebetween Observeevent and Eventreactive
Element-Wise Mean Over List of Matrices
Read a Utf-8 Text File with Bom
Split a Column of Concatenated Comma-Delimited Data and Recode Output as Factors