Apply function to each column in a data frame observing each columns existing data type
If it were an "ordered factor" things would be different. Which is not to say I like "ordered factors", I don't, only to say that some relationships are defined for 'ordered factors' that are not defined for "factors". Factors are thought of as ordinary categorical variables. You are seeing the natural sort order of factors which is alphabetical lexical order for your locale. If you want to get an automatic coercion to "numeric" for every column, ... dates and factors and all, then try:
sapply(df, function(x) max(as.numeric(x)) ) # not generally a useful result
Or if you want to test for factors first and return as you expect then:
sapply( df, function(x) if("factor" %in% class(x) ) {
max(as.numeric(as.character(x)))
} else { max(x) } )
@Darrens comment does work better:
sapply(df, function(x) max(as.character(x)) )
max
does succeed with character vectors.
Apply a function to every column
In dplyr
, you can use across
to apply a function to multiple columns.
library(dplyr)
df <- df %>% mutate(across(starts_with('var'), ~./sd(.)))
df
# var1 var2 var3
# <dbl> <dbl> <dbl>
# 1 0.0384 0.118 0.707
# 2 0.0767 0.237 0.354
# 3 1.34 0.474 1.06
# 4 0.192 0.632 1.06
# 5 0.844 0.809 1.24
# 6 1.02 0.987 1.41
In base R, we can use lapply
-
df[] <- lapply(df, function(x) x/sd(x))
To apply this to selected columns (1:168
) you can do
df[1:168] <- lapply(df[1:168], function(x) x/sd(x))
apply a function to each element of a column of a dataframe
The issue is with if/else
which is not vectorized. If we change the function to ifelse
, it would work. Another issue is that apply
with MARGIN
it expects a data.frame/matrix. Here, it is extracting a vector
'G3'
fun_pass <- function(calif) ifelse(calif >= 10, 1, 0)
Here we don't need ifelse
also
fun_pass <- function(calif) as.integer(calif >= 10)
If it is a single column, use
mat_data$pass <- fun_pass(mat_data$G3)
How do you apply a function to each cell of a column?
If speed isn't an issue, you can use lapply
or a purrr::map
function (or even a for loop) to go through each row of your data, saving each tibble result in a list
, and then combine the list of tibbles into a nice big tibble to work with. E.g.,
# dplyr and lapply
result_list = lapply(your_data$Records, coronary_anatomy)
names(result_list) = your_data$PatientID
result_tbl = bind_rows(result_list, .id = "PatientID")
result_tbl
# # A tibble: 4 x 3
# PatientID anatomy stenosis
# <chr> <chr> <dbl>
# 1 1234 proximal RCA 50
# 2 1234 mid RCA 70
# 3 1235 proximal LCX 40
# 4 1235 mid LCX 70
If you're using dplyr
version 1.0 or higher, you can also do this simply with group_by
and summarize
:
your_data %>%
group_by(PatientID) %>%
summarize(coronary_anatomy(Records))
# `summarise()` regrouping output by 'PatientID' (override with `.groups` argument)
# # A tibble: 4 x 3
# # Groups: PatientID [2]
# PatientID anatomy stenosis
# <int> <chr> <dbl>
# 1 1234 proximal RCA 50
# 2 1234 mid RCA 70
# 3 1235 proximal LCX 40
# 4 1235 mid LCX 70
Is there an R function for performing basic operations on every column of a data frame?
We can use scale
on the dataset
scale(df1)
Or if we want to use a custom function, create the function, loop over the columns with lapply
, apply the function and assign it back to the dataframe
f1 <- function(x) (x-min(col)/(max(col)-min(col))
df1[] <- lapply(df1, f1)
Or this can be done with mutate_all
library(dplyr)
df1 %>%
mutate_all(f1)
Applying a function to every row on each n number of columns in R
Here is one approach:
Let d
be your 3 rows x 2000 columns frame, with column names as.character(1:2000)
(See below for generation of fake data). We add a row identifier using .I
, then melt the data long, adding grp
, and column-group identifier (i.e. identifying the 20 sets of 100). Then apply your function myfunc
(see below for stand-in function for this example), by row and group, and swing wide. (I used stringr::str_pad
to add 0 to the front of the group number)
# add row identifier
d[, row:=.I]
# melt and add col group identifier
dm = melt(d,id.vars = "row",variable.factor = F)[,variable:=as.numeric(variable)][order(variable,row), grp:=rep(1:20, each=300)]
# get the result (180 rows long), applying myfync to each set of columns, by row
result = dm[, myfunc(value), by=.(row,grp)][,frow:=rep(1:3,times=60)]
# swing wide (3 rows long, 60 columns wide)
dcast(
result[,v:=paste0("grp",stringr::str_pad(grp,2,pad = "0"),"_",row)],
frow~v,value.var="V1"
)[, frow:=NULL][]
Output: (first six columns only)
grp01_1 grp01_2 grp01_3 grp02_1 grp02_2 grp02_3
<num> <num> <num> <num> <num> <num>
1: 0.54187168 0.47650694 0.48045694 0.51278399 0.51777319 0.46607845
2: 0.06671367 0.08763655 0.08076939 0.07930063 0.09830116 0.07807937
3: 0.25828989 0.29603471 0.28419957 0.28160367 0.31353016 0.27942687
Input:
d = data.table()
alloc.col(d,2000)
set.seed(123)
for(c in 1:2000) set(d,j=as.character(c), value=runif(3))
myfunc
Function (toy example for this answer):
myfunc <- function(x) c(mean(x), var(x), sd(x))
Apply function to every column in every table in list R
lapply(your_list,function(x){apply(x,2,mean)})
Replace mean
by min
, max
or median
according to which one you want to calculate.
apply(x,2,...)
apply the function on each column (i. e. 2nd dimension) of matrix, array or dataframe.lapply(...)
apply the function to each element of a list.
Add a new column to each df in a list of dfs using apply function
The function week_no
is not vectorised so you would need some kind of loop to iterate over each value after strsplit
. In the for
loop you use sapply
, so we can use the same here.
lapply(mapp_dfs, function(x) cbind(x,
week_nums = sapply(as.Date(unlist(strsplit(x$Timestamp, "T"))[c(TRUE,FALSE)]), week_no)))
#$l1
# Timestamp Value Q.code week_nums
#1 1993-08-30T00 13.53 1 36
#2 2002-01-16T00 1.55 2 3
#3 2010-01-13T00 5.63 3 3
#4 2016-11-08T00 7.32 4 46
#5 2019-05-13T00 7.89 5 20
#$l2
# Timestamp Value Q.code week_nums
#1 1994-07-10T00 13.53 1 28
#2 2003-01-26T00 1.55 1 4
#3 2011-01-13T00 5.63 3 3
#4 2016-11-08T00 9.31 4 46
#5 2019-05-23T00 5.63 1 21
#$l3
# Timestamp Value Q.code week_nums
#1 1995-08-30T00 1.36 2 36
#2 2004-01-16T00 5.63 2 3
#3 2012-01-13T00 5.63 5 3
#4 2013-11-08T00 7.32 4 45
#5 2019-06-03T00 5.22 4 23
Related Topics
Dynamically Adjust Height And/Or Width of Shiny-Plotly Output Based on Window Size
Rename a Sequence of Variable Names in Data Frame
Dplyr::N() Returns "Error: This Function Should Not Be Called Directly"
R Remove Non-Alphanumeric Symbols from a String
How to Find Index of Match Between Two Set of Data Frame
Does the Ternary Operator Exist in R
R Reading in a Zip Data File Without Unzipping It
How to Add an Inset (Subplot) to "Topright" of an R Plot
Shiny - Can Dynamically Generated Buttons Act as Trigger for an Event
How to Use the Row.Names Attribute to Order the Rows of My Dataframe in R
How to Make PDF Download in Shiny App Response to User Inputs
Multiple Functions in a Single Tapply or Aggregate Statement
Join Data.Table on Exact Date or If Not the Case on the Nearest Less Than Date