How to Apply a Function to a Certain Column for All the Data Frames in Environment in R

How to apply a function to a certain column for all the data frames in environment in R

I'd strongly suggest putting your data frames in a list instead of just leaving them in the global environment. The answer I link to should help you understand why lists are better, and also show how you could do lists from the start instead of this "find all data frames and put them in a list" approach.

eapply is difficult because there's nothing built-in to let you apply, say, only to data frames. And eapply returns results as a list, so it doesn't make much sense for adding columns to existing data frames.

df_names = ls()[sapply(mget(ls()), is.data.frame)]
df_list = mget(df_names)
result_list = lapply(df_list, function(d) d$new_col = <code for new column>)

I'm not sure what you want since you don't post your desired output. quantile(x, c(.15, .8)) returns 2 values, and your data frames have more than 2 rows, so I'm not sure what you want added - 2 new columns? 1 new column with recycling? something else?

Alternatively, maybe you just want a 2-number summary for each data frame? In that case sapply does nice simplification and keeps the names:

sapply(df_list, function(d) quantile(d$cpm, c(0.15, 0.8)))
# AE.mac AF.android BD.ios
# 15% 0.0009111413 0.1545266 0.0002341395
# 80% 0.0071962008 0.3567230 0.0076989311

EDIT based on your edits, let's work directly with data. We don't need to split, we certainly don't need list2env after the split. Adding columns by group is easy and efficient with dplyr or data.table. For example:

library(dplyr)
data %>%
group_by(geo, os) %>%
summarize(quantile_15 = quantile(cpm, .15),
quantile_80 = quantile(cpm, 0.8))
# # A tibble: 81 x 4
# # Groups: geo [?]
# geo os quantile_15 quantile_80
# <fct> <fct> <dbl> <dbl>
# 1 AE android 0.118 0.118
# 2 AE blackberry 0.00833 0.00833
# 3 AR mac 0.0296 0.0296
# 4 AT android 0.665 0.665
# 5 AU android 0.482 0.482
# 6 AU ios 0.374 0.374
# 7 AU mac 0.00903 0.00903
# ...

Or with data.table:

library(data.table)
setDT(data)
data[, as.list(quantile(cpm, c(0.15, 0.8))), by = .(geo, os)]
# geo os 15% 80%
# 1: EC ios 2.595296e-01 2.595296e-01
# 2: AE blackberry 8.325000e-03 8.325000e-03
# 3: AT android 6.645070e-01 6.645070e-01
# 4: EG android 1.702811e-02 8.928342e-02
# 5: AE android 1.176471e-01 1.176471e-01
# 6: CA windows 6.301327e-01 6.301327e-01

Loop through dataframes in global environment and apply function to them

Keeping your original formatear function

formatear <- function(eq){
eq$Volume <- NULL
eq$Date <- as.Date(eq$Date)

nt <- eq$Adj.Close[1:nrow(eq)-1]
nt1 <- eq$Adj.Close[2:nrow(eq)]
eq$return <- percent(c(NA, nt1/nt-1), accuracy = 0.0001)
return(eq)
}

You can use mget to get list of dataframes and apply the function with lapply.

clean_list_data <- lapply(mget(dfs), formatear)

clean_list_data should be a list of dataframes in the format that you want. You can access individual dataframes with clean_list_data[[1]], clean_list_data[[2]] and so on. It is easier to manage the data if you keep them in a list like this instead of creating multiple dataframes in the global environment.

Applying a function to all data.frames in the environment

You get(dfs[i]) which returns a reference to a data.table, but then you are lapply-ing each column of that frame and I'm inferring from the function argument dataframe that you expect a full frame. One might start with:

for (i in seq_along(dfs)) {
get(dfs[i])[ , cleanfunction(.SD)]
}

but realize that this operation returns a new frame, it does not use canonical data.table mechanisms for updating data in-place. I suggest you update your function to always force data.table and work on it referentially.

cleanfunction <- function(dataframe) {
setDT(dataframe)
## get mode of all vars
var_mode <- sapply(dataframe, mode)
## produce error if complex or raw is found
if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
## get class of all vars
var_class <- sapply(dataframe, class)
## produce error if an "AsIs" object has "logical" or "character" mode
if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
stop("matrix variables with 'AsIs' class must be 'numeric'")
}
## identify columns that needs be coerced to factors
ind1 <- which(var_mode %in% c("logical", "character"))
## coerce logical / character to factor with `as.factor`
if (length(ind1)) dataframe[, c(ind1) := lapply(.SD, as.factor), .SDcols = ind1]
return(dataframe)
}

Since your current data does not trigger any changes, I'll update one:

DT[,quux:="A"]
head(DT)
# A B C D quux
# <int> <int> <int> <int> <char>
# 1: 1 1 12 15 A
# 2: 1 2 4 6 A
# 3: 1 3 5 7 A
# 4: 1 4 9 1 A
# 5: 1 5 6 14 A
# 6: 2 1 15 13 A

for (i in seq_along(dfs)) cleanfunction(get(dfs[i]))
head(DT)
# A B C D quux
# <int> <int> <int> <int> <fctr>
# 1: 1 1 12 15 A
# 2: 1 2 4 6 A
# 3: 1 3 5 7 A
# 4: 1 4 9 1 A
# 5: 1 5 6 14 A
# 6: 2 1 15 13 A

Note that the for loop is relying solely on referential updates; the return value from cleanfunction is ignored here.

This method works entirely because of data.table referential semantics; if you were using data.frame or tbl_df, this would likely require wrapping that call to cleanfunction(.) with assign(dfs[i], cleanfunction(..)).

How to make a functional list with the data.frames from the environment in R?

If we have multiple data.frames in the global environment that we want to merge, we can use mget and ls:

file_1 = data.frame(id = c(1,2), a = c(1,2))
file_2 = data.frame(id = c(1,2), b = c(3,4))
file_3 = data.frame(id = c(3,4), a = c(5,6))

Reduce(\(...) merge(..., all = T), mget(ls(pattern = "file")))
id a b
1 1 1 3
2 2 2 4
3 3 5 NA
4 4 6 NA

How to loop though entire dataframe in R and get the head function

Put the dataframes in a list using mget and use head with lapply -

lapply(mget(paste0('d', 1:3)), head)

Applying a function with information from the global environment to a list

You can modify your function so that it runs on a list as opposed to an environment:

list_pend <- list(pend4P_17k=pend4P_17k, pend5P_17k=pend5P_17k, pend10P_17k=pend10P_17k)

add_name_cols <- function(l){
for(i in seq_along(l)){
l[[i]]$Pendant_ID <- gsub("^pend(.{2,3})_.*$", "\\1", names(l)[i])
}
return(l)
}

list_pend <- add_name_cols(list_pend)

Output

> add_name_cols(list_pend)
$pend4P_17k
x var1 var2 Pendant_ID
1 1 a 1 4P
2 2 b 1 4P
3 3 c 0 4P
4 4 d 0 4P
5 5 e 1 4P

$pend5P_17k
x var1 var2 Pendant_ID
1 1 a 1 5P
2 2 b 1 5P
3 3 c 0 5P
4 4 d 0 5P
5 5 e 1 5P

$pend10P_17k
x var1 var2 Pendant_ID
1 1 a 1 10P
2 2 b 1 10P
3 3 c 0 10P
4 4 d 0 10P
5 5 e 1 10P


Related Topics



Leave a reply



Submit