Read All Files in Directory and Apply Multiple Functions to Each Data Frame

Loop through files and use functions, then use that result to form a dataframe in r

I am assuming that you want to stack all the data.frames on top of each other (row bind). map (from purrr) or lapply can apply a function to each item in
a given list/vector (each filename in this case). map_dfr does the same and row binds all the outputs.

filenames <- list.files(pattern = "*.sorted.bam")

library(purrr)
purrr::map_dfr(filenames, ~pileup(.x,
index = .x,
scanBamParam = ScanBamParam(),
pileupParam = PileupParam()))

Apply a function to multiple files to load data into a pyspark dataframe

If all the files have the same json structure, then you simply need to use the spark.read.json function.

Function spark.read.json accepts a list of files as a parameter.

spark.read.json(List_all_json file)

This will read all the files in the list and return a single data frame for all the information in the files.

For more information, read this

R - apply function on two files in folders with for loop or lapply and save results in one dataframe

Try this solution :

  1. Get all the folders using list.dirs.

  2. For each folder read the "alpha" and "beta" files and return a 3 column tibble back with alpha, beta and alphabeta values.

  3. Bind all the dataframes with and id column to know from which folder each value is coming.

all_folders <- list.dirs('Data/', recursive = FALSE, full.names = TRUE)

result <- purrr::map_df(all_folders, function(x) {
all_Files <- list.files(x, full.names = TRUE, pattern = 'alpha|beta')
df1 <- read.csv(all_Files[1])
df2 <- read.csv(all_Files[2])
tibble::tibble(alpha = df1$mean, beta = df2$mean, alphabeta = alpha/beta)
}, .id = "id")

Use function in loop over set of files in directory

you can read file names with dir and then loop over them, read each file and do your tapply, create vector with name of file and results for each file and merge them with rbind. I hope that this is similar to what you wanted or at least it can push you in right direction.

new_df<-c()
list_of_files <- dir("your_folder_where_data_is")
for(f in list_of_files){
df <- read.csv(file.path("your_folder_where_data_is",f))
new_line <- c(f, tapply(df$V1.1, df$V1, mean))
new_df <- rbind(new_df, new_line)
}

R- How to read from multiple directories and apply function on same file names contained within different directories

I would be lazy and list all the files in one go and use regex to find the appropriate one for each iteration. Something along the lines of

# list all files with paths
(x <- list.files(full.names = TRUE, recursive = TRUE))

[1] "./figure/delez_skupin.pdf" "./figure/diag_efekt_odstrela.pdf"
[3] "./figure/diag_maxent.pdf" "./figure/diag_teza_v_casu.pdf"
[5] "./figure/diag_teza_v_casu2.pdf" "./figure/efekt_odstrela.pdf"
[7] "./figure/fig_teza.pdf" "./figure/graf_odstrel_razmerje_kategorija.pdf"
[9] "./figure/graf_odstrel_razmerje_kategorija1.pdf" "./figure/graf_odstrel_razmerje_kategorija2.pdf"
[11] "./figure/graf_starost_v_letih_skupaj.pdf" "./figure/korelacija_med_odstrelom_in_sist_1.pdf"
[13] "./figure/korelacija_med_odstrelom_in_sist_2.pdf" "./figure/modeliranje_maxent_sistematicno.pdf"
[15] "./figure/plot_glm_maxent_model1.pdf" "./figure/plot_glm_maxent_model2.pdf"
[17] "./figure/pregled_prostorskih_podatkov.pdf" "./figure/prikaz_okoljskih_spremenljivk1.pdf"
[19] "./figure/prikaz_okoljskih_spremenljivk2.pdf" "./figure/prikaz_okoljskih_spremenljivk3.pdf"
[21] "./figure/prikaz_okoljskih_spremenljivk4.pdf" "./figure/priloznostna_glede_na_mesec.pdf"
[23] "./figure/primerjava_spremenljivk_glede_prisotnosti.pdf" "./figure/priprava_primerjava.pdf"
[25] "./figure/razsirjenost_gamsa_tnp.pdf" "./figure/razsirjenost_gamsa_v_tnp.pdf"
[27] "./figure/sprememba_strukture_po_mesecih.pdf" "./figure/sprememba_strukture_po_mesecih_abs.pdf"
[29] "./figure/sprememba_strukture_po_mesecih_rel.pdf" "./figure/st_osebkov_na_leto_priloznostna.pdf"
[31] "./figure/st_osebkov_na_leto_sistematicna.pdf" "./figure/teza_enoletnikov.pdf"
[33] "./figure/vpliv_js_glm1.pdf" "./figure/vpliv_js_glm2.pdf"
...
[51] "./ostale_slike/naslovnica_gams.jpg" "./ostale_slike/nepipaj/naslovnica_gams.jpg"
[53] "./ostale_slike/nepipaj/slika17_odlov_tone.jpg" "./ostale_slike/nepipaj/slika18_odlov_irena.jpg"
[55] "./ostale_slike/nepipaj/slika19_odlov_irena_markica.jpg" "./ostale_slike/nepipaj/slika20_odlov_luna.jpg"
[57] "./ostale_slike/nepipaj/slika21_gibanje_irena.png" "./ostale_slike/nepipaj/slika22_gibanje_mojca.png"
[59] "./ostale_slike/nepipaj/slika23_gibanje_tone.png" "./ostale_slike/nepipaj/slika24_gibanje_luna.png"
[61] "./ostale_slike/nepipaj/slika25_gibanje_irena_jesen_zima.png" "./ostale_slike/nepipaj/slika26_gibanje_mojca_jesen_zima.png"
[63] "./ostale_slike/nepipaj/slika27_gibanje_tone_jesen_zima.png" "./ostale_slike/nepipaj/slika28_graf_aktivnosti.jpg"
[65] "./ostale_slike/razsirjenost_gamsa_slovenija.png" "./ostale_slike/slika17_odlov_tone.jpg"
[67] "./ostale_slike/slika18_odlov_irena.jpg" "./ostale_slike/slika19_odlov_irena_markica.jpg"
[69] "./ostale_slike/slika20_odlov_luna.jpg" "./ostale_slike/slika21_gibanje_irena.jpg"
[71] "./ostale_slike/slika22_gibanje_mojca.jpg" "./ostale_slike/slika23_gibanje_tone.jpg"
[73] "./ostale_slike/slika24_gibanje_luna.jpg" "./ostale_slike/slika25_gibanje_irena_jesen_zima.jpg"
[75] "./ostale_slike/slika26_gibanje_mojca_jesen_zima.jpg" "./ostale_slike/slika27_gibanje_tone_jesen_zima.jpg"
[77] "./ostale_slike/slika28_graf_aktivnosti.jpg" "./ostale_slike/slo_gams.bmp"

# find all files that start with "slika2"
x[grepl("slika2", x)]
[1] "./ostale_slike/nepipaj/slika20_odlov_luna.jpg" "./ostale_slike/nepipaj/slika21_gibanje_irena.png"
[3] "./ostale_slike/nepipaj/slika22_gibanje_mojca.png" "./ostale_slike/nepipaj/slika23_gibanje_tone.png"
[5] "./ostale_slike/nepipaj/slika24_gibanje_luna.png" "./ostale_slike/nepipaj/slika25_gibanje_irena_jesen_zima.png"
[7] "./ostale_slike/nepipaj/slika26_gibanje_mojca_jesen_zima.png" "./ostale_slike/nepipaj/slika27_gibanje_tone_jesen_zima.png"
[9] "./ostale_slike/nepipaj/slika28_graf_aktivnosti.jpg" "./ostale_slike/slika20_odlov_luna.jpg"
[11] "./ostale_slike/slika21_gibanje_irena.jpg" "./ostale_slike/slika22_gibanje_mojca.jpg"
[13] "./ostale_slike/slika23_gibanje_tone.jpg" "./ostale_slike/slika24_gibanje_luna.jpg"
[15] "./ostale_slike/slika25_gibanje_irena_jesen_zima.jpg" "./ostale_slike/slika26_gibanje_mojca_jesen_zima.jpg"
[17] "./ostale_slike/slika27_gibanje_tone_jesen_zima.jpg" "./ostale_slike/slika28_graf_aktivnosti.jpg"

Having full file names you can import your data sets and manipulate them further.

Pandas apply multiple function with list

Assuming you want to extract and convert the possible chunks as date, you could split the string on delimiters, explode to multiple rows and attempt to convert to date with pandas.to_datetime:

df.join(pd
.to_datetime(df['File_name']
.str.split(r'[_-]')
.explode(), errors='coerce')
.dropna().rename('Date')
)

output:

                 File_name  Result       Date
0 f1h3_13oct2021_gt1.csv 2.0 2021-10-13
1 p8-gfr-20dec2021-81.csv 0.5 2021-12-20

NB. if you have potentially many dates per string, you need to add a further step to select the one you want. Please give more details if this is the case.

python version for old pandas
import re
s = pd.Series([next(iter(pd.to_datetime(re.split(r'[._-]', s), errors='coerce')
.dropna()), float('nan'))
for s in df['File_name']], index=df.index, name='date')

df.join(s)

Read multiple files from a folder and pass each file through a function in R

You can write a function which

1) Reads the file

2) Performs all the data-processing steps

3) writes the new file

library(tidyverse)
library(lubridate)
library(data.table)

f1 <- function(file) {
readxl::read_xlsx(file) %>%
group_by(date = floor_date(DATE,"month")) %>%
summarize(SALES = sum(SALES)) %>%
separate(date, sep="-", into = c("year", "month")) %>%
mutate(lag_12 = shift(SALES,-12),
lag_24 = shift(SALES,-24)) %>%
writexl::write_xlsx(paste0('new_', basename(file)))
}

and do this for every file.

lapply(filenames, f1)


Related Topics



Leave a reply



Submit