Using lapply to apply a function over list of data frames and saving output to files with different names
It will work with the following lapply
call:
lapply(names(mylist), function(x) NewVar(mylist[[x]], "y", x))
Using lapply to set column names for a list of data frames?
It seems you want to update the original dataframes. In that case, your list MUST be named. ie check the code below.
List <- list(a = a, b = b, c = c, d = d)
list2env(lapply(List, setNames, nm = headers), globalenv())
Now if you call a
you will note that it has been updated.
How do I apply a function over multiple data frames, but overwrite them?
You can use the list2env()
function.
list2env(data_list, envir = .GlobalEnv)
This will return all the data frames from your list and save them to the environment. This will also keep the data frame object's name.
Keeping original list item names when using lapply over an existing list
Just add the names:
names(dat)
# [1] "grp_1" "grp_2" "grp_3"
names(dat_new)
# NULL
names(dat_new) <- names(dat)
names(dat_new)
# [1] "grp_1" "grp_2" "grp_3"
R - apply function on two files in folders with for loop or lapply and save results in one dataframe
Try this solution :
Get all the folders using
list.dirs
.For each folder read the "alpha" and "beta" files and return a 3 column tibble back with
alpha
,beta
andalphabeta
values.Bind all the dataframes with and
id
column to know from which folder each value is coming.
all_folders <- list.dirs('Data/', recursive = FALSE, full.names = TRUE)
result <- purrr::map_df(all_folders, function(x) {
all_Files <- list.files(x, full.names = TRUE, pattern = 'alpha|beta')
df1 <- read.csv(all_Files[1])
df2 <- read.csv(all_Files[2])
tibble::tibble(alpha = df1$mean, beta = df2$mean, alphabeta = alpha/beta)
}, .id = "id")
Extending an sapply to apply list of variables and saving output as list of data frames in R
Instead of $
to reference named elements, consider [[
extractor to reference names by string. Also, extend substitute
for dynamic variable:
# DEFINED METHOD
df_build <- function(var) {
sapply(levels(dclus1$variables[[var]]), function(x) {
form <- as.formula(substitute(~I(var %in% x),
list(var=as.name(var), x=x)))
z <- svyciprop(form, dclus1, method="me", df=degf(dclus1))
c(z, c(attr(z,"ci")))
}) %>%
as.data.frame() %>%
slice(1) %>%
reshape::melt() %>%
dplyr::mutate(value = round(value, digits = 4)*100)
}
# ITERATE THROUGH CHARACTER VECTOR AND CALL METHOD
var_list <- list("stype", "awards")
df_list <- lapply(var_list, df_build)
Apply function to columns in a list of data frames and append results
lapply
works fine here. Note that a return(x)
is needed here, otherwise we would just return the new vector.
res <- lapply(ls.1, function(x){
x$d <- x$b + x$c
return(x)
})
Using lapply to apply a function over read-in list of files and saving output as new list of files
The reason the output is directed to the same file is probably that file = paste0(names(DF), "txt", sep=".")
returns the same value for every iteration. That is, DF
must have the same column names in every iteration, therefore names(DF)
will be the same, and paste0(names(DF), "txt", sep=".")
will be the same. Along with the append = TRUE
option the result is that all output is written to the same file.
Inside the anonymous function, x
is the name of the input file. Instead of using names(DF)
as a basis for the output file name you could do some transformation of this character string.
example.
Given
x <- "/foo/raw_data.csv"
Inside the function you could do something like this
infile <- x
outfile <- file.path(dirname(infile), gsub('raw', 'clean', basename(infile)))
outfile
[1] "/foo/clean_data.csv"
Then use the new name for output, with append = FALSE
(unless you need it to be true)
write.table(DF, file = outfile, row.names = FALSE, col.names = FALSE, append = FALSE, fileEncoding = "UTF-8")
Applying a Function to a Data Frame : lapply vs traditional way
When working within a data.frame
you could use apply
instead of lapply
:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(x,y) { return(x+y) }
data_frame$new_column <- apply(data_frame, 1, \(x) some_function(x["Var1"], x["Var2"]))
head(data_frame)
To apply a function to rows set MAR = 1
, to apply a function to columns set MAR = 2
.
lapply
, as the name suggests, is a list-apply. As a data.frame
is a list of columns you can use it to compute over columns but within rectangular data, apply
is often the easiest.
If some_function
is written for that specific purpose, it can be written to accept a single row of the data.frame as in
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(row) { return(row[1]+row[2]) }
data_frame$yet_another <- apply(data_frame, 1, some_function)
head(data_frame)
Final comment: Often functions written for only a pair of values come out as perfectly vectorized. Probably the best way to call some_function
is without any function of the apply
-familiy as in
some_function <- function(x,y) { return(x + y) }
data_frame$last_one <- some_function(data_frame$Var1, data_frame$Var2)
Related Topics
Writing Multiple Data Frames into .CSV Files Using R
How to Add Multiple Columns to a Data.Frame in One Go
Poly() in Lm(): Difference Between Raw VS. Orthogonal
Data Table Merge Based on Date Ranges
How to Edit and Debug R Library Sources
What Is the Significance of the New Reference Classes
Dplyr::Mutate to Add Multiple Values
Check Whether Values in One Data Frame Column Exist in a Second Data Frame
Creating a Symmetric Matrix in R
Conditionally Display a Block of Text in R Markdown
How to Extract Month from Date in R
Write List of Data.Frames to Separate CSV Files with Lapply
Add Extra Level to Factors in Dataframe
Using Grep to Help Subset a Data Frame
Duplicating (And Modifying) Discrete Axis in Ggplot2