Read All Files in a Folder and Apply a Function to Each Data Frame

R - apply function on two files in folders with for loop or lapply and save results in one dataframe

Try this solution :

  1. Get all the folders using list.dirs.

  2. For each folder read the "alpha" and "beta" files and return a 3 column tibble back with alpha, beta and alphabeta values.

  3. Bind all the dataframes with and id column to know from which folder each value is coming.

all_folders <- list.dirs('Data/', recursive = FALSE, full.names = TRUE)

result <- purrr::map_df(all_folders, function(x) {
all_Files <- list.files(x, full.names = TRUE, pattern = 'alpha|beta')
df1 <- read.csv(all_Files[1])
df2 <- read.csv(all_Files[2])
tibble::tibble(alpha = df1$mean, beta = df2$mean, alphabeta = alpha/beta)
}, .id = "id")

Apply the same function over all the elements in a folder

You can use the map() function from purrr R package.

Note:
I'm assuming that each RData contains a data.frame called df with at least two columns called datetime and number.

First, define you "complete date" function as follows:

library(tidyverse)

complete_date <- function(df) {
min_date <- min(df$datetime)
max_date <- max(df$datetime)
datetime = seq(min_date, max_date, by = "1 day")

table <- tibble(datetime = datetime, number = 0)

table %>% inner_join(df, by = "datetime")
}

We gonna apply this function over each RData using map:

file_names %>% 
map(function(file_name) {
load(file_name) # Load RData first
complete_date(df) # Apply the function
})

This will create a list of all the complete data.frames, which you can use to write as RData with save().

Apply a particular function in all files of a folder using R

One option is to recursively return all the files from the main folder with list.files, then apply the custom fuction by looping over the files, and convert to a single data.frame with do.call(rbind

files <- list.files('path/to/your/folder', recursive = TRUE, 
pattern = "\\.txt$", full.names = TRUE)
lst1 <- lapply(files, DNAdupstability)
out <- do.call(rbind, lst1)

Or we can use map from purrr with _dfr to combine all the output from the list to a single data.frame

library(purrr)
out <- map_dfr(files, DNAdupstability)

Read multiple files from a folder and pass each file through a function in R

You can write a function which

1) Reads the file

2) Performs all the data-processing steps

3) writes the new file

library(tidyverse)
library(lubridate)
library(data.table)

f1 <- function(file) {
readxl::read_xlsx(file) %>%
group_by(date = floor_date(DATE,"month")) %>%
summarize(SALES = sum(SALES)) %>%
separate(date, sep="-", into = c("year", "month")) %>%
mutate(lag_12 = shift(SALES,-12),
lag_24 = shift(SALES,-24)) %>%
writexl::write_xlsx(paste0('new_', basename(file)))
}

and do this for every file.

lapply(filenames, f1)

How to read each file from a folder and create seperate data frames for each file?

A better approach to using different variables for each of your dataframes would be to load each dataframe into a dictionary.

The basename of each filename could be extracted using a combination of os.path.basename() and os.path.splitext().

For example:

d = {os.path.splitext(os.path.basename(f))[0] : pd.read_csv(f) for f in glob.glob('*test*.csv')} 

Also, using *test* would avoid the need for the if in the comprehension.

How to do same function on every file in a folder in R?

If you are willing to use the whole tidyverse set of packages, purrr gives you map_dfr, which returns a single dataframe by rbinding each dataset you read in. More info about it here.

The code would look something like this:

library(tidyverse)

list.files(path = "path_to_data", full.names = TRUE) %>%
map_dfr(read.csv) %>%
group_by(date) %>%
summarize(hour_sum = sum(hours))



Related Topics



Leave a reply



Submit