How to apply a function to a certain column for all the data frames in environment in R
I'd strongly suggest putting your data frames in a list instead of just leaving them in the global environment. The answer I link to should help you understand why lists are better, and also show how you could do lists from the start instead of this "find all data frames and put them in a list" approach.
eapply
is difficult because there's nothing built-in to let you apply, say, only to data frames. And eapply
returns results as a list
, so it doesn't make much sense for adding columns to existing data frames.
df_names = ls()[sapply(mget(ls()), is.data.frame)]
df_list = mget(df_names)
result_list = lapply(df_list, function(d) d$new_col = <code for new column>)
I'm not sure what you want since you don't post your desired output. quantile(x, c(.15, .8))
returns 2 values, and your data frames have more than 2 rows, so I'm not sure what you want added - 2 new columns? 1 new column with recycling? something else?
Alternatively, maybe you just want a 2-number summary for each data frame? In that case sapply
does nice simplification and keeps the names:
sapply(df_list, function(d) quantile(d$cpm, c(0.15, 0.8)))
# AE.mac AF.android BD.ios
# 15% 0.0009111413 0.1545266 0.0002341395
# 80% 0.0071962008 0.3567230 0.0076989311
EDIT based on your edits, let's work directly with data
. We don't need to split
, we certainly don't need list2env
after the split
. Adding columns by group is easy and efficient with dplyr
or data.table
. For example:
library(dplyr)
data %>%
group_by(geo, os) %>%
summarize(quantile_15 = quantile(cpm, .15),
quantile_80 = quantile(cpm, 0.8))
# # A tibble: 81 x 4
# # Groups: geo [?]
# geo os quantile_15 quantile_80
# <fct> <fct> <dbl> <dbl>
# 1 AE android 0.118 0.118
# 2 AE blackberry 0.00833 0.00833
# 3 AR mac 0.0296 0.0296
# 4 AT android 0.665 0.665
# 5 AU android 0.482 0.482
# 6 AU ios 0.374 0.374
# 7 AU mac 0.00903 0.00903
# ...
Or with data.table
:
library(data.table)
setDT(data)
data[, as.list(quantile(cpm, c(0.15, 0.8))), by = .(geo, os)]
# geo os 15% 80%
# 1: EC ios 2.595296e-01 2.595296e-01
# 2: AE blackberry 8.325000e-03 8.325000e-03
# 3: AT android 6.645070e-01 6.645070e-01
# 4: EG android 1.702811e-02 8.928342e-02
# 5: AE android 1.176471e-01 1.176471e-01
# 6: CA windows 6.301327e-01 6.301327e-01
Loop through dataframes in global environment and apply function to them
Keeping your original formatear
function
formatear <- function(eq){
eq$Volume <- NULL
eq$Date <- as.Date(eq$Date)
nt <- eq$Adj.Close[1:nrow(eq)-1]
nt1 <- eq$Adj.Close[2:nrow(eq)]
eq$return <- percent(c(NA, nt1/nt-1), accuracy = 0.0001)
return(eq)
}
You can use mget
to get list of dataframes and apply the function with lapply
.
clean_list_data <- lapply(mget(dfs), formatear)
clean_list_data
should be a list of dataframes in the format that you want. You can access individual dataframes with clean_list_data[[1]]
, clean_list_data[[2]]
and so on. It is easier to manage the data if you keep them in a list like this instead of creating multiple dataframes in the global environment.
Applying a function to all data.frames in the environment
You get(dfs[i])
which returns a reference to a data.table
, but then you are lapply
-ing each column of that frame and I'm inferring from the function argument dataframe
that you expect a full frame. One might start with:
for (i in seq_along(dfs)) {
get(dfs[i])[ , cleanfunction(.SD)]
}
but realize that this operation returns a new frame, it does not use canonical data.table
mechanisms for updating data in-place. I suggest you update your function to always force data.table
and work on it referentially.
cleanfunction <- function(dataframe) {
setDT(dataframe)
## get mode of all vars
var_mode <- sapply(dataframe, mode)
## produce error if complex or raw is found
if (any(var_mode %in% c("complex", "raw"))) stop("complex or raw not allowed!")
## get class of all vars
var_class <- sapply(dataframe, class)
## produce error if an "AsIs" object has "logical" or "character" mode
if (any(var_mode[var_class == "AsIs"] %in% c("logical", "character"))) {
stop("matrix variables with 'AsIs' class must be 'numeric'")
}
## identify columns that needs be coerced to factors
ind1 <- which(var_mode %in% c("logical", "character"))
## coerce logical / character to factor with `as.factor`
if (length(ind1)) dataframe[, c(ind1) := lapply(.SD, as.factor), .SDcols = ind1]
return(dataframe)
}
Since your current data does not trigger any changes, I'll update one:
DT[,quux:="A"]
head(DT)
# A B C D quux
# <int> <int> <int> <int> <char>
# 1: 1 1 12 15 A
# 2: 1 2 4 6 A
# 3: 1 3 5 7 A
# 4: 1 4 9 1 A
# 5: 1 5 6 14 A
# 6: 2 1 15 13 A
for (i in seq_along(dfs)) cleanfunction(get(dfs[i]))
head(DT)
# A B C D quux
# <int> <int> <int> <int> <fctr>
# 1: 1 1 12 15 A
# 2: 1 2 4 6 A
# 3: 1 3 5 7 A
# 4: 1 4 9 1 A
# 5: 1 5 6 14 A
# 6: 2 1 15 13 A
Note that the for
loop is relying solely on referential updates; the return value from cleanfunction
is ignored here.
This method works entirely because of data.table
referential semantics; if you were using data.frame
or tbl_df
, this would likely require wrapping that call to cleanfunction(.)
with assign(dfs[i], cleanfunction(..))
.
How to make a functional list with the data.frames from the environment in R?
If we have multiple data.frames in the global environment that we want to merge, we can use mget
and ls
:
file_1 = data.frame(id = c(1,2), a = c(1,2))
file_2 = data.frame(id = c(1,2), b = c(3,4))
file_3 = data.frame(id = c(3,4), a = c(5,6))
Reduce(\(...) merge(..., all = T), mget(ls(pattern = "file")))
id a b
1 1 1 3
2 2 2 4
3 3 5 NA
4 4 6 NA
How to loop though entire dataframe in R and get the head function
Put the dataframes in a list using mget
and use head
with lapply
-
lapply(mget(paste0('d', 1:3)), head)
Applying a function with information from the global environment to a list
You can modify your function so that it runs on a list as opposed to an environment:
list_pend <- list(pend4P_17k=pend4P_17k, pend5P_17k=pend5P_17k, pend10P_17k=pend10P_17k)
add_name_cols <- function(l){
for(i in seq_along(l)){
l[[i]]$Pendant_ID <- gsub("^pend(.{2,3})_.*$", "\\1", names(l)[i])
}
return(l)
}
list_pend <- add_name_cols(list_pend)
Output
> add_name_cols(list_pend)
$pend4P_17k
x var1 var2 Pendant_ID
1 1 a 1 4P
2 2 b 1 4P
3 3 c 0 4P
4 4 d 0 4P
5 5 e 1 4P
$pend5P_17k
x var1 var2 Pendant_ID
1 1 a 1 5P
2 2 b 1 5P
3 3 c 0 5P
4 4 d 0 5P
5 5 e 1 5P
$pend10P_17k
x var1 var2 Pendant_ID
1 1 a 1 10P
2 2 b 1 10P
3 3 c 0 10P
4 4 d 0 10P
5 5 e 1 10P
Related Topics
Ggplot2 Make Missing Value in Geom_Tile Not Blank
Grouped Operations That Result in Length Not Equal to 1 or Length of Group in Dplyr
Dt[!(X == .)] and Dt[X != .] Treat Na in X Inconsistently
How to Create Md5 Hash of a Column in R
R: Sample() Command Subject to a Constraint
Error Creating R Data.Table with Date-Time Posixlt
R Dplyr Rowwise Mean or Min and Other Methods
Replace Missing Values (Na) in One Data Set with Values from Another Where Columns Match
Geom_Tile and Facet_Grid/Facet_Wrap for Same Height of Tiles
Ggplot2: Reorder Bars from Highest to Lowest in Each Facet
Combining Duplicated Rows in R and Adding New Column Containing Ids of Duplicates
How to Retry a Statement on Error