Using Lapply to Apply a Function Over List of Data Frames and Saving Output to Files with Different Names

Using lapply to apply a function over list of data frames and saving output to files with different names

It will work with the following lapply call:

lapply(names(mylist), function(x) NewVar(mylist[[x]], "y", x))

Using lapply to set column names for a list of data frames?

It seems you want to update the original dataframes. In that case, your list MUST be named. ie check the code below.

List <- list(a = a, b = b, c = c, d = d)
list2env(lapply(List, setNames, nm = headers), globalenv())

Now if you call a you will note that it has been updated.

How do I apply a function over multiple data frames, but overwrite them?

You can use the list2env() function.

list2env(data_list, envir = .GlobalEnv)

This will return all the data frames from your list and save them to the environment. This will also keep the data frame object's name.

Keeping original list item names when using lapply over an existing list

Just add the names:

names(dat)
# [1] "grp_1" "grp_2" "grp_3"
names(dat_new)
# NULL
names(dat_new) <- names(dat)
names(dat_new)
# [1] "grp_1" "grp_2" "grp_3"

R - apply function on two files in folders with for loop or lapply and save results in one dataframe

Try this solution :

  1. Get all the folders using list.dirs.

  2. For each folder read the "alpha" and "beta" files and return a 3 column tibble back with alpha, beta and alphabeta values.

  3. Bind all the dataframes with and id column to know from which folder each value is coming.

all_folders <- list.dirs('Data/', recursive = FALSE, full.names = TRUE)

result <- purrr::map_df(all_folders, function(x) {
all_Files <- list.files(x, full.names = TRUE, pattern = 'alpha|beta')
df1 <- read.csv(all_Files[1])
df2 <- read.csv(all_Files[2])
tibble::tibble(alpha = df1$mean, beta = df2$mean, alphabeta = alpha/beta)
}, .id = "id")

Extending an sapply to apply list of variables and saving output as list of data frames in R

Instead of $ to reference named elements, consider [[ extractor to reference names by string. Also, extend substitute for dynamic variable:

# DEFINED METHOD
df_build <- function(var) {
sapply(levels(dclus1$variables[[var]]), function(x) {
form <- as.formula(substitute(~I(var %in% x),
list(var=as.name(var), x=x)))
z <- svyciprop(form, dclus1, method="me", df=degf(dclus1))
c(z, c(attr(z,"ci")))
}) %>%
as.data.frame() %>%
slice(1) %>%
reshape::melt() %>%
dplyr::mutate(value = round(value, digits = 4)*100)
}

# ITERATE THROUGH CHARACTER VECTOR AND CALL METHOD
var_list <- list("stype", "awards")
df_list <- lapply(var_list, df_build)

Apply function to columns in a list of data frames and append results

lapply works fine here. Note that a return(x) is needed here, otherwise we would just return the new vector.

res <- lapply(ls.1, function(x){
x$d <- x$b + x$c
return(x)
})

Using lapply to apply a function over read-in list of files and saving output as new list of files

The reason the output is directed to the same file is probably that file = paste0(names(DF), "txt", sep=".") returns the same value for every iteration. That is, DF must have the same column names in every iteration, therefore names(DF) will be the same, and paste0(names(DF), "txt", sep=".") will be the same. Along with the append = TRUE option the result is that all output is written to the same file.

Inside the anonymous function, x is the name of the input file. Instead of using names(DF) as a basis for the output file name you could do some transformation of this character string.

example.

Given

x <- "/foo/raw_data.csv"

Inside the function you could do something like this

infile <- x
outfile <- file.path(dirname(infile), gsub('raw', 'clean', basename(infile)))

outfile
[1] "/foo/clean_data.csv"

Then use the new name for output, with append = FALSE (unless you need it to be true)

write.table(DF, file = outfile, row.names = FALSE, col.names = FALSE, append = FALSE, fileEncoding = "UTF-8")

Applying a Function to a Data Frame : lapply vs traditional way

When working within a data.frame you could use apply instead of lapply:

x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(x,y) { return(x+y) }

data_frame$new_column <- apply(data_frame, 1, \(x) some_function(x["Var1"], x["Var2"]))
head(data_frame)

To apply a function to rows set MAR = 1, to apply a function to columns set MAR = 2.

lapply, as the name suggests, is a list-apply. As a data.frame is a list of columns you can use it to compute over columns but within rectangular data, apply is often the easiest.

If some_function is written for that specific purpose, it can be written to accept a single row of the data.frame as in

x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)

some_function <- function(row) { return(row[1]+row[2]) }

data_frame$yet_another <- apply(data_frame, 1, some_function)
head(data_frame)

Final comment: Often functions written for only a pair of values come out as perfectly vectorized. Probably the best way to call some_function is without any function of the apply-familiy as in

some_function <- function(x,y) { return(x + y) }
data_frame$last_one <- some_function(data_frame$Var1, data_frame$Var2)


Related Topics



Leave a reply



Submit