Applying a Function to Every Row of a Table Using Dplyr

Applying a function to every row of a table using dplyr?

As of dplyr 0.2 (I think) rowwise() is implemented, so the answer to this problem becomes:

iris %>% 
rowwise() %>%
mutate(Max.Len= max(Sepal.Length,Petal.Length))

Non rowwise alternative

Five years (!) later this answer still gets a lot of traffic. Since it was given, rowwise is increasingly not recommended, although lots of people seem to find it intuitive. Do yourself a favour and go through Jenny Bryan's Row-oriented workflows in R with the tidyverse material to get a good handle on this topic.

The most straightforward way I have found is based on one of Hadley's examples using pmap:

iris %>% 
mutate(Max.Len= purrr::pmap_dbl(list(Sepal.Length, Petal.Length), max))

Using this approach, you can give an arbitrary number of arguments to the function (.f) inside pmap.

pmap is a good conceptual approach because it reflects the fact that when you're doing row wise operations you're actually working with tuples from a list of vectors (the columns in a dataframe).

Apply function to a row in a data.frame using dplyr

We just need the data to be specified as . as data.frame is a list with columns as list elements. If we wrap list(.), it becomes a nested list

library(dplyr)
d %>%
mutate(u = pmap_int(., ~ which.max(c(...))))
# a b c u
#1 1 4 2 2
#2 2 3 3 2
#3 3 2 4 3
#4 4 1 5 3

Or can use cur_data()

d %>%
mutate(u = pmap_int(cur_data(), ~ which.max(c(...))))

Or if we want to use everything(), place that inside select as list(everything()) doesn't address the data from which everything should be selected

d %>% 
mutate(u = pmap_int(select(., everything()), ~ which.max(c(...))))

Or using rowwise

d %>%
rowwise %>%
mutate(u = which.max(cur_data())) %>%
ungroup
# A tibble: 4 x 4
# a b c u
# <int> <int> <int> <int>
#1 1 4 2 2
#2 2 3 3 2
#3 3 2 4 3
#4 4 1 5 3

Or this is more efficient with max.col

max.col(d, 'first')
#[1] 2 2 3 3

Or with collapse

library(collapse)
dapply(d, which.max, MARGIN = 1)
#[1] 2 2 3 3

which can be included in dplyr as

d %>% 
mutate(u = max.col(cur_data(), 'first'))

Applying a function to every row on each n number of columns in R

Here is one approach:

Let d be your 3 rows x 2000 columns frame, with column names as.character(1:2000) (See below for generation of fake data). We add a row identifier using .I, then melt the data long, adding grp, and column-group identifier (i.e. identifying the 20 sets of 100). Then apply your function myfunc (see below for stand-in function for this example), by row and group, and swing wide. (I used stringr::str_pad to add 0 to the front of the group number)

# add row identifier
d[, row:=.I]

# melt and add col group identifier
dm = melt(d,id.vars = "row",variable.factor = F)[,variable:=as.numeric(variable)][order(variable,row), grp:=rep(1:20, each=300)]

# get the result (180 rows long), applying myfync to each set of columns, by row
result = dm[, myfunc(value), by=.(row,grp)][,frow:=rep(1:3,times=60)]

# swing wide (3 rows long, 60 columns wide)
dcast(
result[,v:=paste0("grp",stringr::str_pad(grp,2,pad = "0"),"_",row)],
frow~v,value.var="V1"
)[, frow:=NULL][]

Output: (first six columns only)

      grp01_1    grp01_2    grp01_3    grp02_1    grp02_2    grp02_3
<num> <num> <num> <num> <num> <num>
1: 0.54187168 0.47650694 0.48045694 0.51278399 0.51777319 0.46607845
2: 0.06671367 0.08763655 0.08076939 0.07930063 0.09830116 0.07807937
3: 0.25828989 0.29603471 0.28419957 0.28160367 0.31353016 0.27942687

Input:

d = data.table()
alloc.col(d,2000)
set.seed(123)
for(c in 1:2000) set(d,j=as.character(c), value=runif(3))

myfunc Function (toy example for this answer):

myfunc <- function(x) c(mean(x), var(x), sd(x))

dplyr - apply a custom function using rowwise()

I don't think your problem is with rowwise. The way your function is written, it's expecting a single object. Try adding a c():

dt2 %>% rowwise() %>% mutate(nr_of_0s = zerocount(c(A, B, C)))

Note that, if you aren't committed to using your own function, you can skip rowwise entirely, as Nettle also notes. rowSums already treats data frames in a rowwise fashion, which is why this works:

dt2 %>% mutate(nr_of_0s = rowSums(. == 0))

How to pass a single row for a function using dplyr

dplyr's rowwise() puts the row-output (.data) as a list of lists, so you need to use [[. You also need to use .data rather than ., because . is the entire dff, rather than the individual rows.

my_fun <- function(df, col_1, col_2){
df[[col_1]] + df[[col_2]]
}

dff %>%
rowwise() %>%
mutate(res = my_fun(.data, 'a', 'b'))

You can see what .data looks like with the code below

dff %>%
rowwise() %>%
do(res = .data) %>%
.[[1]] %>%
head(1)

# [[1]]
# [[1]]$a
# [1] 1
#
# [[1]]$b
# [1] 1

Apply a function to every column

In dplyr, you can use across to apply a function to multiple columns.

library(dplyr)
df <- df %>% mutate(across(starts_with('var'), ~./sd(.)))
df

# var1 var2 var3
# <dbl> <dbl> <dbl>
# 1 0.0384 0.118 0.707
# 2 0.0767 0.237 0.354
# 3 1.34 0.474 1.06
# 4 0.192 0.632 1.06
# 5 0.844 0.809 1.24
# 6 1.02 0.987 1.41

In base R, we can use lapply -

df[] <- lapply(df, function(x) x/sd(x))

To apply this to selected columns (1:168) you can do

df[1:168] <- lapply(df[1:168], function(x) x/sd(x))

Applying function to every row using a range of columns (R)

You can do:

library(e1071)

# get column names
cols <- paste0('V', seq(1,1998,1))

# apply function on selected columns
NoDup2$skew_value <- apply(NoDup2[,cols], 1, skewness)

With this we calculate skewness for every row across all columns in the given data set.

Apply a function to every specified column in a data.table and update by reference

This seems to work:

dt[ , (cols) := lapply(.SD, "*", -1), .SDcols = cols]

The result is

    a  b d
1: -1 -1 1
2: -2 -2 2
3: -3 -3 3

There are a few tricks here:

  • Because there are parentheses in (cols) :=, the result is assigned to the columns specified in cols, instead of to some new variable named "cols".
  • .SDcols tells the call that we're only looking at those columns, and allows us to use .SD, the Subset of the Data associated with those columns.
  • lapply(.SD, ...) operates on .SD, which is a list of columns (like all data.frames and data.tables). lapply returns a list, so in the end j looks like cols := list(...).

EDIT: Here's another way that is probably faster, as @Arun mentioned:

for (j in cols) set(dt, j = j, value = -dt[[j]])

dplyr: apply function table() to each column of a data.frame

Using tidyverse (dplyr and purrr):

library(tidyverse)

mtcars %>%
map( function(x) table(x) )

Or:

mtcars %>%
map(~ table(.x) )

Or simply:

library(tidyverse)

mtcars %>%
map( table )

Apply a function to every row of a matrix or a data frame

You simply use the apply() function:

R> M <- matrix(1:6, nrow=3, byrow=TRUE)
R> M
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
R> apply(M, 1, function(x) 2*x[1]+x[2])
[1] 4 10 16
R>

This takes a matrix and applies a (silly) function to each row. You pass extra arguments to the function as fourth, fifth, ... arguments to apply().



Related Topics



Leave a reply



Submit