Apply Function to Each Column in a Data Frame Observing Each Columns Existing Data Type

Apply function to each column in a data frame observing each columns existing data type

If it were an "ordered factor" things would be different. Which is not to say I like "ordered factors", I don't, only to say that some relationships are defined for 'ordered factors' that are not defined for "factors". Factors are thought of as ordinary categorical variables. You are seeing the natural sort order of factors which is alphabetical lexical order for your locale. If you want to get an automatic coercion to "numeric" for every column, ... dates and factors and all, then try:

sapply(df, function(x) max(as.numeric(x)) )   # not generally a useful result

Or if you want to test for factors first and return as you expect then:

sapply( df, function(x) if("factor" %in% class(x) ) { 
max(as.numeric(as.character(x)))
} else { max(x) } )

@Darrens comment does work better:

 sapply(df, function(x) max(as.character(x)) )  

max does succeed with character vectors.

Apply a function to every column

In dplyr, you can use across to apply a function to multiple columns.

library(dplyr)
df <- df %>% mutate(across(starts_with('var'), ~./sd(.)))
df

# var1 var2 var3
# <dbl> <dbl> <dbl>
# 1 0.0384 0.118 0.707
# 2 0.0767 0.237 0.354
# 3 1.34 0.474 1.06
# 4 0.192 0.632 1.06
# 5 0.844 0.809 1.24
# 6 1.02 0.987 1.41

In base R, we can use lapply -

df[] <- lapply(df, function(x) x/sd(x))

To apply this to selected columns (1:168) you can do

df[1:168] <- lapply(df[1:168], function(x) x/sd(x))

apply a function to each element of a column of a dataframe

The issue is with if/else which is not vectorized. If we change the function to ifelse, it would work. Another issue is that apply with MARGIN it expects a data.frame/matrix. Here, it is extracting a vector 'G3'

fun_pass <- function(calif) ifelse(calif >= 10, 1, 0)

Here we don't need ifelse also

fun_pass <- function(calif) as.integer(calif >= 10)

If it is a single column, use

mat_data$pass <- fun_pass(mat_data$G3)

How do you apply a function to each cell of a column?

If speed isn't an issue, you can use lapply or a purrr::map function (or even a for loop) to go through each row of your data, saving each tibble result in a list, and then combine the list of tibbles into a nice big tibble to work with. E.g.,

# dplyr and lapply
result_list = lapply(your_data$Records, coronary_anatomy)
names(result_list) = your_data$PatientID
result_tbl = bind_rows(result_list, .id = "PatientID")
result_tbl
# # A tibble: 4 x 3
# PatientID anatomy stenosis
# <chr> <chr> <dbl>
# 1 1234 proximal RCA 50
# 2 1234 mid RCA 70
# 3 1235 proximal LCX 40
# 4 1235 mid LCX 70

If you're using dplyr version 1.0 or higher, you can also do this simply with group_by and summarize:

your_data %>% 
group_by(PatientID) %>%
summarize(coronary_anatomy(Records))
# `summarise()` regrouping output by 'PatientID' (override with `.groups` argument)
# # A tibble: 4 x 3
# # Groups: PatientID [2]
# PatientID anatomy stenosis
# <int> <chr> <dbl>
# 1 1234 proximal RCA 50
# 2 1234 mid RCA 70
# 3 1235 proximal LCX 40
# 4 1235 mid LCX 70

Is there an R function for performing basic operations on every column of a data frame?

We can use scale on the dataset

scale(df1)

Or if we want to use a custom function, create the function, loop over the columns with lapply, apply the function and assign it back to the dataframe

f1 <- function(x) (x-min(col)/(max(col)-min(col))
df1[] <- lapply(df1, f1)

Or this can be done with mutate_all

library(dplyr) 
df1 %>%
mutate_all(f1)

Applying a function to every row on each n number of columns in R

Here is one approach:

Let d be your 3 rows x 2000 columns frame, with column names as.character(1:2000) (See below for generation of fake data). We add a row identifier using .I, then melt the data long, adding grp, and column-group identifier (i.e. identifying the 20 sets of 100). Then apply your function myfunc (see below for stand-in function for this example), by row and group, and swing wide. (I used stringr::str_pad to add 0 to the front of the group number)

# add row identifier
d[, row:=.I]

# melt and add col group identifier
dm = melt(d,id.vars = "row",variable.factor = F)[,variable:=as.numeric(variable)][order(variable,row), grp:=rep(1:20, each=300)]

# get the result (180 rows long), applying myfync to each set of columns, by row
result = dm[, myfunc(value), by=.(row,grp)][,frow:=rep(1:3,times=60)]

# swing wide (3 rows long, 60 columns wide)
dcast(
result[,v:=paste0("grp",stringr::str_pad(grp,2,pad = "0"),"_",row)],
frow~v,value.var="V1"
)[, frow:=NULL][]

Output: (first six columns only)

      grp01_1    grp01_2    grp01_3    grp02_1    grp02_2    grp02_3
<num> <num> <num> <num> <num> <num>
1: 0.54187168 0.47650694 0.48045694 0.51278399 0.51777319 0.46607845
2: 0.06671367 0.08763655 0.08076939 0.07930063 0.09830116 0.07807937
3: 0.25828989 0.29603471 0.28419957 0.28160367 0.31353016 0.27942687

Input:

d = data.table()
alloc.col(d,2000)
set.seed(123)
for(c in 1:2000) set(d,j=as.character(c), value=runif(3))

myfunc Function (toy example for this answer):

myfunc <- function(x) c(mean(x), var(x), sd(x))

Apply function to every column in every table in list R

lapply(your_list,function(x){apply(x,2,mean)})

Replace mean by min, max or median according to which one you want to calculate.

apply(x,2,...) apply the function on each column (i. e. 2nd dimension) of matrix, array or dataframe.
lapply(...) apply the function to each element of a list.

Add a new column to each df in a list of dfs using apply function

The function week_no is not vectorised so you would need some kind of loop to iterate over each value after strsplit. In the for loop you use sapply, so we can use the same here.

lapply(mapp_dfs, function(x) cbind(x, 
week_nums = sapply(as.Date(unlist(strsplit(x$Timestamp, "T"))[c(TRUE,FALSE)]), week_no)))

#$l1
# Timestamp Value Q.code week_nums
#1 1993-08-30T00 13.53 1 36
#2 2002-01-16T00 1.55 2 3
#3 2010-01-13T00 5.63 3 3
#4 2016-11-08T00 7.32 4 46
#5 2019-05-13T00 7.89 5 20

#$l2
# Timestamp Value Q.code week_nums
#1 1994-07-10T00 13.53 1 28
#2 2003-01-26T00 1.55 1 4
#3 2011-01-13T00 5.63 3 3
#4 2016-11-08T00 9.31 4 46
#5 2019-05-23T00 5.63 1 21

#$l3
# Timestamp Value Q.code week_nums
#1 1995-08-30T00 1.36 2 36
#2 2004-01-16T00 5.63 2 3
#3 2012-01-13T00 5.63 5 3
#4 2013-11-08T00 7.32 4 45
#5 2019-06-03T00 5.22 4 23


Related Topics



Leave a reply



Submit