Applying a Function to Every Row of a Table Using Dplyr

Applying a function to every row of a table using dplyr?

As of dplyr 0.2 (I think) rowwise() is implemented, so the answer to this problem becomes:

iris %>% 
  rowwise() %>% 
  mutate(Max.Len= max(Sepal.Length,Petal.Length))

Non `rowwise` alternative

Five years (!) later this answer still gets a lot of traffic. Since it was given, rowwise is increasingly not recommended, although lots of people seem to find it intuitive. Do yourself a favour and go through Jenny Bryan's Row-oriented workflows in R with the tidyverse material to get a good handle on this topic.

The most straightforward way I have found is based on one of Hadley's examples using pmap:

iris %>% 
  mutate(Max.Len= purrr::pmap_dbl(list(Sepal.Length, Petal.Length), max))

Using this approach, you can give an arbitrary number of arguments to the function (.f) inside pmap.

pmap is a good conceptual approach because it reflects the fact that when you're doing row wise operations you're actually working with tuples from a list of vectors (the columns in a dataframe).

Apply function to a row in a data.frame using dplyr

We just need the data to be specified as . as data.frame is a list with columns as list elements. If we wrap list(.), it becomes a nested list

library(dplyr)
d %>% 
  mutate(u = pmap_int(., ~ which.max(c(...))))
#  a b c u
#1 1 4 2 2
#2 2 3 3 2
#3 3 2 4 3
#4 4 1 5 3

Or can use cur_data()

d %>%
   mutate(u = pmap_int(cur_data(), ~ which.max(c(...))))

Or if we want to use everything(), place that inside select as list(everything()) doesn't address the data from which everything should be selected

d %>% 
   mutate(u = pmap_int(select(., everything()), ~ which.max(c(...))))

Or using rowwise

d %>%
    rowwise %>% 
    mutate(u = which.max(cur_data())) %>%
    ungroup
# A tibble: 4 x 4
#      a     b     c     u
#  <int> <int> <int> <int>
#1     1     4     2     2
#2     2     3     3     2
#3     3     2     4     3
#4     4     1     5     3

Or this is more efficient with max.col

max.col(d, 'first')
#[1] 2 2 3 3

Or with collapse

library(collapse)
dapply(d, which.max, MARGIN = 1)
#[1] 2 2 3 3

which can be included in dplyr as

d %>% 
    mutate(u = max.col(cur_data(), 'first'))

Applying a function to every row on each n number of columns in R

Here is one approach:

Let d be your 3 rows x 2000 columns frame, with column names as.character(1:2000) (See below for generation of fake data). We add a row identifier using .I, then melt the data long, adding grp, and column-group identifier (i.e. identifying the 20 sets of 100). Then apply your function myfunc (see below for stand-in function for this example), by row and group, and swing wide. (I used stringr::str_pad to add 0 to the front of the group number)

# add row identifier
d[, row:=.I]

# melt and add col group identifier
dm = melt(d,id.vars = "row",variable.factor = F)[,variable:=as.numeric(variable)][order(variable,row), grp:=rep(1:20, each=300)]

# get the result (180 rows long), applying myfync to each set of columns, by row
result = dm[, myfunc(value), by=.(row,grp)][,frow:=rep(1:3,times=60)]

# swing wide (3 rows long, 60 columns wide)
dcast(
  result[,v:=paste0("grp",stringr::str_pad(grp,2,pad = "0"),"_",row)],
  frow~v,value.var="V1"
  )[, frow:=NULL][]

Output: (first six columns only)

      grp01_1    grp01_2    grp01_3    grp02_1    grp02_2    grp02_3
        <num>      <num>      <num>      <num>      <num>      <num>
1: 0.54187168 0.47650694 0.48045694 0.51278399 0.51777319 0.46607845
2: 0.06671367 0.08763655 0.08076939 0.07930063 0.09830116 0.07807937
3: 0.25828989 0.29603471 0.28419957 0.28160367 0.31353016 0.27942687

Input:

d = data.table()
alloc.col(d,2000)
set.seed(123)
for(c in 1:2000)  set(d,j=as.character(c), value=runif(3))

myfunc Function (toy example for this answer):

myfunc <- function(x) c(mean(x), var(x), sd(x))

dplyr - apply a custom function using rowwise()

I don't think your problem is with rowwise. The way your function is written, it's expecting a single object. Try adding a c():

dt2 %>% rowwise() %>% mutate(nr_of_0s = zerocount(c(A, B, C)))

Note that, if you aren't committed to using your own function, you can skip rowwise entirely, as Nettle also notes. rowSums already treats data frames in a rowwise fashion, which is why this works:

dt2 %>% mutate(nr_of_0s = rowSums(. == 0))

How to pass a single row for a function using dplyr

dplyr's rowwise() puts the row-output (.data) as a list of lists, so you need to use [[. You also need to use .data rather than ., because . is the entire dff, rather than the individual rows.

my_fun <- function(df, col_1, col_2){
  df[[col_1]] + df[[col_2]]
}

dff %>%
  rowwise() %>%
  mutate(res = my_fun(.data, 'a', 'b'))

You can see what .data looks like with the code below

dff %>%
  rowwise() %>%
  do(res = .data) %>% 
  .[[1]] %>% 
  head(1)

# [[1]]
# [[1]]$a
# [1] 1
# 
# [[1]]$b
# [1] 1

Apply a function to every column

In dplyr, you can use across to apply a function to multiple columns.

library(dplyr)
df <- df %>% mutate(across(starts_with('var'), ~./sd(.)))
df

#    var1  var2  var3
#    <dbl> <dbl> <dbl>
# 1 0.0384 0.118 0.707
# 2 0.0767 0.237 0.354
# 3 1.34   0.474 1.06 
# 4 0.192  0.632 1.06 
# 5 0.844  0.809 1.24 
# 6 1.02   0.987 1.41

In base R, we can use lapply -

df[] <- lapply(df, function(x) x/sd(x))

To apply this to selected columns (1:168) you can do

df[1:168] <- lapply(df[1:168], function(x) x/sd(x))

Applying function to every row using a range of columns (R)

You can do:

library(e1071)

# get column names
cols <- paste0('V', seq(1,1998,1))

# apply function on selected columns
NoDup2$skew_value <- apply(NoDup2[,cols], 1, skewness)

With this we calculate skewness for every row across all columns in the given data set.

Apply a function to every specified column in a data.table and update by reference

This seems to work:

dt[ , (cols) := lapply(.SD, "*", -1), .SDcols = cols]

The result is

    a  b d
1: -1 -1 1
2: -2 -2 2
3: -3 -3 3

There are a few tricks here:

Because there are parentheses in (cols) :=, the result is assigned to the columns specified in cols, instead of to some new variable named "cols".
.SDcols tells the call that we're only looking at those columns, and allows us to use .SD, the Subset of the Data associated with those columns.
lapply(.SD, ...) operates on .SD, which is a list of columns (like all data.frames and data.tables). lapply returns a list, so in the end j looks like cols := list(...).

EDIT: Here's another way that is probably faster, as @Arun mentioned:

for (j in cols) set(dt, j = j, value = -dt[[j]])

dplyr: apply function table() to each column of a data.frame

Using tidyverse (dplyr and purrr):

library(tidyverse)

mtcars %>%
    map( function(x) table(x) )

Or:

mtcars %>%
    map(~ table(.x) )

Or simply:

library(tidyverse)

mtcars %>%
    map( table )

Apply a function to every row of a matrix or a data frame

You simply use the apply() function:

R> M <- matrix(1:6, nrow=3, byrow=TRUE)
R> M
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
R> apply(M, 1, function(x) 2*x[1]+x[2])
[1]  4 10 16
R>

This takes a matrix and applies a (silly) function to each row. You pass extra arguments to the function as fourth, fifth, ... arguments to apply().

Applying a Function to Every Row of a Table Using Dplyr