Applying a function to every row of a table using dplyr?
As of dplyr 0.2 (I think) rowwise()
is implemented, so the answer to this problem becomes:
iris %>%
rowwise() %>%
mutate(Max.Len= max(Sepal.Length,Petal.Length))
Non rowwise
alternative
Five years (!) later this answer still gets a lot of traffic. Since it was given, rowwise
is increasingly not recommended, although lots of people seem to find it intuitive. Do yourself a favour and go through Jenny Bryan's Row-oriented workflows in R with the tidyverse material to get a good handle on this topic.
The most straightforward way I have found is based on one of Hadley's examples using pmap
:
iris %>%
mutate(Max.Len= purrr::pmap_dbl(list(Sepal.Length, Petal.Length), max))
Using this approach, you can give an arbitrary number of arguments to the function (.f
) inside pmap
.
pmap
is a good conceptual approach because it reflects the fact that when you're doing row wise operations you're actually working with tuples from a list of vectors (the columns in a dataframe).
Apply function to a row in a data.frame using dplyr
We just need the data to be specified as .
as data.frame
is a list
with columns as list elements. If we wrap list(.)
, it becomes a nested list
library(dplyr)
d %>%
mutate(u = pmap_int(., ~ which.max(c(...))))
# a b c u
#1 1 4 2 2
#2 2 3 3 2
#3 3 2 4 3
#4 4 1 5 3
Or can use cur_data()
d %>%
mutate(u = pmap_int(cur_data(), ~ which.max(c(...))))
Or if we want to use everything()
, place that inside select
as list(everything())
doesn't address the data from which everything should be selected
d %>%
mutate(u = pmap_int(select(., everything()), ~ which.max(c(...))))
Or using rowwise
d %>%
rowwise %>%
mutate(u = which.max(cur_data())) %>%
ungroup
# A tibble: 4 x 4
# a b c u
# <int> <int> <int> <int>
#1 1 4 2 2
#2 2 3 3 2
#3 3 2 4 3
#4 4 1 5 3
Or this is more efficient with max.col
max.col(d, 'first')
#[1] 2 2 3 3
Or with collapse
library(collapse)
dapply(d, which.max, MARGIN = 1)
#[1] 2 2 3 3
which can be included in dplyr
as
d %>%
mutate(u = max.col(cur_data(), 'first'))
Applying a function to every row on each n number of columns in R
Here is one approach:
Let d
be your 3 rows x 2000 columns frame, with column names as.character(1:2000)
(See below for generation of fake data). We add a row identifier using .I
, then melt the data long, adding grp
, and column-group identifier (i.e. identifying the 20 sets of 100). Then apply your function myfunc
(see below for stand-in function for this example), by row and group, and swing wide. (I used stringr::str_pad
to add 0 to the front of the group number)
# add row identifier
d[, row:=.I]
# melt and add col group identifier
dm = melt(d,id.vars = "row",variable.factor = F)[,variable:=as.numeric(variable)][order(variable,row), grp:=rep(1:20, each=300)]
# get the result (180 rows long), applying myfync to each set of columns, by row
result = dm[, myfunc(value), by=.(row,grp)][,frow:=rep(1:3,times=60)]
# swing wide (3 rows long, 60 columns wide)
dcast(
result[,v:=paste0("grp",stringr::str_pad(grp,2,pad = "0"),"_",row)],
frow~v,value.var="V1"
)[, frow:=NULL][]
Output: (first six columns only)
grp01_1 grp01_2 grp01_3 grp02_1 grp02_2 grp02_3
<num> <num> <num> <num> <num> <num>
1: 0.54187168 0.47650694 0.48045694 0.51278399 0.51777319 0.46607845
2: 0.06671367 0.08763655 0.08076939 0.07930063 0.09830116 0.07807937
3: 0.25828989 0.29603471 0.28419957 0.28160367 0.31353016 0.27942687
Input:
d = data.table()
alloc.col(d,2000)
set.seed(123)
for(c in 1:2000) set(d,j=as.character(c), value=runif(3))
myfunc
Function (toy example for this answer):
myfunc <- function(x) c(mean(x), var(x), sd(x))
dplyr - apply a custom function using rowwise()
I don't think your problem is with rowwise. The way your function is written, it's expecting a single object. Try adding a c():
dt2 %>% rowwise() %>% mutate(nr_of_0s = zerocount(c(A, B, C)))
Note that, if you aren't committed to using your own function, you can skip rowwise entirely, as Nettle also notes. rowSums
already treats data frames in a rowwise fashion, which is why this works:
dt2 %>% mutate(nr_of_0s = rowSums(. == 0))
How to pass a single row for a function using dplyr
dplyr
's rowwise()
puts the row-output (.data
) as a list of lists, so you need to use [[
. You also need to use .data
rather than .
, because .
is the entire dff
, rather than the individual rows.
my_fun <- function(df, col_1, col_2){
df[[col_1]] + df[[col_2]]
}
dff %>%
rowwise() %>%
mutate(res = my_fun(.data, 'a', 'b'))
You can see what .data
looks like with the code below
dff %>%
rowwise() %>%
do(res = .data) %>%
.[[1]] %>%
head(1)
# [[1]]
# [[1]]$a
# [1] 1
#
# [[1]]$b
# [1] 1
Apply a function to every column
In dplyr
, you can use across
to apply a function to multiple columns.
library(dplyr)
df <- df %>% mutate(across(starts_with('var'), ~./sd(.)))
df
# var1 var2 var3
# <dbl> <dbl> <dbl>
# 1 0.0384 0.118 0.707
# 2 0.0767 0.237 0.354
# 3 1.34 0.474 1.06
# 4 0.192 0.632 1.06
# 5 0.844 0.809 1.24
# 6 1.02 0.987 1.41
In base R, we can use lapply
-
df[] <- lapply(df, function(x) x/sd(x))
To apply this to selected columns (1:168
) you can do
df[1:168] <- lapply(df[1:168], function(x) x/sd(x))
Applying function to every row using a range of columns (R)
You can do:
library(e1071)
# get column names
cols <- paste0('V', seq(1,1998,1))
# apply function on selected columns
NoDup2$skew_value <- apply(NoDup2[,cols], 1, skewness)
With this we calculate skewness for every row across all columns in the given data set.
Apply a function to every specified column in a data.table and update by reference
This seems to work:
dt[ , (cols) := lapply(.SD, "*", -1), .SDcols = cols]
The result is
a b d
1: -1 -1 1
2: -2 -2 2
3: -3 -3 3
There are a few tricks here:
- Because there are parentheses in
(cols) :=
, the result is assigned to the columns specified incols
, instead of to some new variable named "cols". .SDcols
tells the call that we're only looking at those columns, and allows us to use.SD
, theS
ubset of theD
ata associated with those columns.lapply(.SD, ...)
operates on.SD
, which is a list of columns (like all data.frames and data.tables).lapply
returns a list, so in the endj
looks likecols := list(...)
.
EDIT: Here's another way that is probably faster, as @Arun mentioned:
for (j in cols) set(dt, j = j, value = -dt[[j]])
dplyr: apply function table() to each column of a data.frame
Using tidyverse (dplyr and purrr):
library(tidyverse)
mtcars %>%
map( function(x) table(x) )
Or:
mtcars %>%
map(~ table(.x) )
Or simply:
library(tidyverse)
mtcars %>%
map( table )
Apply a function to every row of a matrix or a data frame
You simply use the apply()
function:
R> M <- matrix(1:6, nrow=3, byrow=TRUE)
R> M
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
R> apply(M, 1, function(x) 2*x[1]+x[2])
[1] 4 10 16
R>
This takes a matrix and applies a (silly) function to each row. You pass extra arguments to the function as fourth, fifth, ... arguments to apply()
.
Related Topics
What Is the Width Argument in Position_Dodge
Wrap Long Axis Labels Via Labeller=Label_Wrap in Ggplot2
Replace Missing Values With Column Mean
How to Create an R Function Programmatically
Plot Multiple Lines in One Graph
Converting Decimal to Binary in R
Sample Random Rows Within Each Group in a Data.Table
Create Integer Sequences Defined by 'From' and 'To' Vectors
Concatenate Row-Wise Across Specific Columns of Dataframe
How to Merge Color, Line Style and Shape Legends in Ggplot
How to Replace Na Values in a Table For Selected Columns
What Are Replacement Functions in R
Count the Number of All Words in a String
How to Display Only Integer Values on an Axis Using Ggplot2
How to Read in Numbers With a Comma as Decimal Separator