R - apply function on two files in folders with for loop or lapply and save results in one dataframe
Try this solution :
Get all the folders using
list.dirs
.For each folder read the "alpha" and "beta" files and return a 3 column tibble back with
alpha
,beta
andalphabeta
values.Bind all the dataframes with and
id
column to know from which folder each value is coming.
all_folders <- list.dirs('Data/', recursive = FALSE, full.names = TRUE)
result <- purrr::map_df(all_folders, function(x) {
all_Files <- list.files(x, full.names = TRUE, pattern = 'alpha|beta')
df1 <- read.csv(all_Files[1])
df2 <- read.csv(all_Files[2])
tibble::tibble(alpha = df1$mean, beta = df2$mean, alphabeta = alpha/beta)
}, .id = "id")
Apply the same function over all the elements in a folder
You can use the map()
function from purrr
R package.
Note:
I'm assuming that each RData contains a data.frame
called df with at least two columns called datetime and number.
First, define you "complete date" function as follows:
library(tidyverse)
complete_date <- function(df) {
min_date <- min(df$datetime)
max_date <- max(df$datetime)
datetime = seq(min_date, max_date, by = "1 day")
table <- tibble(datetime = datetime, number = 0)
table %>% inner_join(df, by = "datetime")
}
We gonna apply this function over each RData using map:
file_names %>%
map(function(file_name) {
load(file_name) # Load RData first
complete_date(df) # Apply the function
})
This will create a list of all the complete data.frames, which you can use to write as RData with save()
.
Apply a particular function in all files of a folder using R
One option is to recursively return all the files from the main folder with list.files
, then apply the custom fuction by looping over the files, and convert to a single data.frame with do.call(rbind
files <- list.files('path/to/your/folder', recursive = TRUE,
pattern = "\\.txt$", full.names = TRUE)
lst1 <- lapply(files, DNAdupstability)
out <- do.call(rbind, lst1)
Or we can use map
from purrr
with _dfr
to combine all the output from the list
to a single data.frame
library(purrr)
out <- map_dfr(files, DNAdupstability)
Read multiple files from a folder and pass each file through a function in R
You can write a function which
1) Reads the file
2) Performs all the data-processing steps
3) writes the new file
library(tidyverse)
library(lubridate)
library(data.table)
f1 <- function(file) {
readxl::read_xlsx(file) %>%
group_by(date = floor_date(DATE,"month")) %>%
summarize(SALES = sum(SALES)) %>%
separate(date, sep="-", into = c("year", "month")) %>%
mutate(lag_12 = shift(SALES,-12),
lag_24 = shift(SALES,-24)) %>%
writexl::write_xlsx(paste0('new_', basename(file)))
}
and do this for every file.
lapply(filenames, f1)
How to read each file from a folder and create seperate data frames for each file?
A better approach to using different variables for each of your dataframes would be to load each dataframe into a dictionary.
The basename of each filename could be extracted using a combination of os.path.basename()
and os.path.splitext()
.
For example:
d = {os.path.splitext(os.path.basename(f))[0] : pd.read_csv(f) for f in glob.glob('*test*.csv')}
Also, using *test*
would avoid the need for the if
in the comprehension.
How to do same function on every file in a folder in R?
If you are willing to use the whole tidyverse
set of packages, purrr
gives you map_dfr
, which returns a single dataframe by rbinding each dataset you read in. More info about it here.
The code would look something like this:
library(tidyverse)
list.files(path = "path_to_data", full.names = TRUE) %>%
map_dfr(read.csv) %>%
group_by(date) %>%
summarize(hour_sum = sum(hours))
Related Topics
Pasting Two Vectors With Combinations of All Vectors' Elements
Why Is the Parallel Package Slower Than Just Using Apply
Concatenate Strings by Group With Dplyr
R on Macos Error: Vector Memory Exhausted (Limit Reached)
Converting Multiple Columns from Character to Numeric Format in R
Which Data.Table Syntax For Left Join (One Column) to Prefer
Generate N Random Integers That Sum to M in R
Repeat Rows of a Data.Frame N Times
How to Swap Values Between Two Columns
Select Subset of Columns in Data.Table R
How to Use Grep()/Gsub() to Find Exact Match
How to Merge 2 Vectors Alternating Indexes
How to Change the Order of Facet Labels in Ggplot (Custom Facet Wrap Labels)
Dplyr Mutate Rowsums Calculations or Custom Functions
Multiply Rows of Matrix by Vector
Why Do I Get "Warning Longer Object Length Is Not a Multiple of Shorter Object Length"