Loop through files and use functions, then use that result to form a dataframe in r
I am assuming that you want to stack all the data.frames on top of each other (row bind). map
(from purrr) or lapply
can apply a function to each item in
a given list/vector (each filename in this case). map_dfr
does the same and row binds all the outputs.
filenames <- list.files(pattern = "*.sorted.bam")
library(purrr)
purrr::map_dfr(filenames, ~pileup(.x,
index = .x,
scanBamParam = ScanBamParam(),
pileupParam = PileupParam()))
Apply a function to multiple files to load data into a pyspark dataframe
If all the files have the same json structure, then you simply need to use the spark.read.json
function.
Function spark.read.json accepts a list of files as a parameter.
spark.read.json(List_all_json file)
This will read all the files in the list and return a single data frame for all the information in the files.
For more information, read this
R - apply function on two files in folders with for loop or lapply and save results in one dataframe
Try this solution :
Get all the folders using
list.dirs
.For each folder read the "alpha" and "beta" files and return a 3 column tibble back with
alpha
,beta
andalphabeta
values.Bind all the dataframes with and
id
column to know from which folder each value is coming.
all_folders <- list.dirs('Data/', recursive = FALSE, full.names = TRUE)
result <- purrr::map_df(all_folders, function(x) {
all_Files <- list.files(x, full.names = TRUE, pattern = 'alpha|beta')
df1 <- read.csv(all_Files[1])
df2 <- read.csv(all_Files[2])
tibble::tibble(alpha = df1$mean, beta = df2$mean, alphabeta = alpha/beta)
}, .id = "id")
Use function in loop over set of files in directory
you can read file names with dir
and then loop over them, read each file and do your tapply, create vector with name of file and results for each file and merge them with rbind
. I hope that this is similar to what you wanted or at least it can push you in right direction.
new_df<-c()
list_of_files <- dir("your_folder_where_data_is")
for(f in list_of_files){
df <- read.csv(file.path("your_folder_where_data_is",f))
new_line <- c(f, tapply(df$V1.1, df$V1, mean))
new_df <- rbind(new_df, new_line)
}
R- How to read from multiple directories and apply function on same file names contained within different directories
I would be lazy and list all the files in one go and use regex to find the appropriate one for each iteration. Something along the lines of
# list all files with paths
(x <- list.files(full.names = TRUE, recursive = TRUE))
[1] "./figure/delez_skupin.pdf" "./figure/diag_efekt_odstrela.pdf"
[3] "./figure/diag_maxent.pdf" "./figure/diag_teza_v_casu.pdf"
[5] "./figure/diag_teza_v_casu2.pdf" "./figure/efekt_odstrela.pdf"
[7] "./figure/fig_teza.pdf" "./figure/graf_odstrel_razmerje_kategorija.pdf"
[9] "./figure/graf_odstrel_razmerje_kategorija1.pdf" "./figure/graf_odstrel_razmerje_kategorija2.pdf"
[11] "./figure/graf_starost_v_letih_skupaj.pdf" "./figure/korelacija_med_odstrelom_in_sist_1.pdf"
[13] "./figure/korelacija_med_odstrelom_in_sist_2.pdf" "./figure/modeliranje_maxent_sistematicno.pdf"
[15] "./figure/plot_glm_maxent_model1.pdf" "./figure/plot_glm_maxent_model2.pdf"
[17] "./figure/pregled_prostorskih_podatkov.pdf" "./figure/prikaz_okoljskih_spremenljivk1.pdf"
[19] "./figure/prikaz_okoljskih_spremenljivk2.pdf" "./figure/prikaz_okoljskih_spremenljivk3.pdf"
[21] "./figure/prikaz_okoljskih_spremenljivk4.pdf" "./figure/priloznostna_glede_na_mesec.pdf"
[23] "./figure/primerjava_spremenljivk_glede_prisotnosti.pdf" "./figure/priprava_primerjava.pdf"
[25] "./figure/razsirjenost_gamsa_tnp.pdf" "./figure/razsirjenost_gamsa_v_tnp.pdf"
[27] "./figure/sprememba_strukture_po_mesecih.pdf" "./figure/sprememba_strukture_po_mesecih_abs.pdf"
[29] "./figure/sprememba_strukture_po_mesecih_rel.pdf" "./figure/st_osebkov_na_leto_priloznostna.pdf"
[31] "./figure/st_osebkov_na_leto_sistematicna.pdf" "./figure/teza_enoletnikov.pdf"
[33] "./figure/vpliv_js_glm1.pdf" "./figure/vpliv_js_glm2.pdf"
...
[51] "./ostale_slike/naslovnica_gams.jpg" "./ostale_slike/nepipaj/naslovnica_gams.jpg"
[53] "./ostale_slike/nepipaj/slika17_odlov_tone.jpg" "./ostale_slike/nepipaj/slika18_odlov_irena.jpg"
[55] "./ostale_slike/nepipaj/slika19_odlov_irena_markica.jpg" "./ostale_slike/nepipaj/slika20_odlov_luna.jpg"
[57] "./ostale_slike/nepipaj/slika21_gibanje_irena.png" "./ostale_slike/nepipaj/slika22_gibanje_mojca.png"
[59] "./ostale_slike/nepipaj/slika23_gibanje_tone.png" "./ostale_slike/nepipaj/slika24_gibanje_luna.png"
[61] "./ostale_slike/nepipaj/slika25_gibanje_irena_jesen_zima.png" "./ostale_slike/nepipaj/slika26_gibanje_mojca_jesen_zima.png"
[63] "./ostale_slike/nepipaj/slika27_gibanje_tone_jesen_zima.png" "./ostale_slike/nepipaj/slika28_graf_aktivnosti.jpg"
[65] "./ostale_slike/razsirjenost_gamsa_slovenija.png" "./ostale_slike/slika17_odlov_tone.jpg"
[67] "./ostale_slike/slika18_odlov_irena.jpg" "./ostale_slike/slika19_odlov_irena_markica.jpg"
[69] "./ostale_slike/slika20_odlov_luna.jpg" "./ostale_slike/slika21_gibanje_irena.jpg"
[71] "./ostale_slike/slika22_gibanje_mojca.jpg" "./ostale_slike/slika23_gibanje_tone.jpg"
[73] "./ostale_slike/slika24_gibanje_luna.jpg" "./ostale_slike/slika25_gibanje_irena_jesen_zima.jpg"
[75] "./ostale_slike/slika26_gibanje_mojca_jesen_zima.jpg" "./ostale_slike/slika27_gibanje_tone_jesen_zima.jpg"
[77] "./ostale_slike/slika28_graf_aktivnosti.jpg" "./ostale_slike/slo_gams.bmp"
# find all files that start with "slika2"
x[grepl("slika2", x)]
[1] "./ostale_slike/nepipaj/slika20_odlov_luna.jpg" "./ostale_slike/nepipaj/slika21_gibanje_irena.png"
[3] "./ostale_slike/nepipaj/slika22_gibanje_mojca.png" "./ostale_slike/nepipaj/slika23_gibanje_tone.png"
[5] "./ostale_slike/nepipaj/slika24_gibanje_luna.png" "./ostale_slike/nepipaj/slika25_gibanje_irena_jesen_zima.png"
[7] "./ostale_slike/nepipaj/slika26_gibanje_mojca_jesen_zima.png" "./ostale_slike/nepipaj/slika27_gibanje_tone_jesen_zima.png"
[9] "./ostale_slike/nepipaj/slika28_graf_aktivnosti.jpg" "./ostale_slike/slika20_odlov_luna.jpg"
[11] "./ostale_slike/slika21_gibanje_irena.jpg" "./ostale_slike/slika22_gibanje_mojca.jpg"
[13] "./ostale_slike/slika23_gibanje_tone.jpg" "./ostale_slike/slika24_gibanje_luna.jpg"
[15] "./ostale_slike/slika25_gibanje_irena_jesen_zima.jpg" "./ostale_slike/slika26_gibanje_mojca_jesen_zima.jpg"
[17] "./ostale_slike/slika27_gibanje_tone_jesen_zima.jpg" "./ostale_slike/slika28_graf_aktivnosti.jpg"
Having full file names you can import your data sets and manipulate them further.
Pandas apply multiple function with list
Assuming you want to extract and convert the possible chunks as date, you could split
the string on delimiters, explode
to multiple rows and attempt to convert to date with pandas.to_datetime
:
df.join(pd
.to_datetime(df['File_name']
.str.split(r'[_-]')
.explode(), errors='coerce')
.dropna().rename('Date')
)
output:
File_name Result Date
0 f1h3_13oct2021_gt1.csv 2.0 2021-10-13
1 p8-gfr-20dec2021-81.csv 0.5 2021-12-20
NB. if you have potentially many dates per string, you need to add a further step to select the one you want. Please give more details if this is the case.
python version for old pandas
import re
s = pd.Series([next(iter(pd.to_datetime(re.split(r'[._-]', s), errors='coerce')
.dropna()), float('nan'))
for s in df['File_name']], index=df.index, name='date')
df.join(s)
Read multiple files from a folder and pass each file through a function in R
You can write a function which
1) Reads the file
2) Performs all the data-processing steps
3) writes the new file
library(tidyverse)
library(lubridate)
library(data.table)
f1 <- function(file) {
readxl::read_xlsx(file) %>%
group_by(date = floor_date(DATE,"month")) %>%
summarize(SALES = sum(SALES)) %>%
separate(date, sep="-", into = c("year", "month")) %>%
mutate(lag_12 = shift(SALES,-12),
lag_24 = shift(SALES,-24)) %>%
writexl::write_xlsx(paste0('new_', basename(file)))
}
and do this for every file.
lapply(filenames, f1)
Related Topics
Long/Bigint/Decimal Equivalent Datatype in R
What Are the "Standard Unambiguous Date" Formats For String-To-Date Conversion in R
Conditionally Change Panel Background With Facet_Grid
A Similar Function to R'S Rep in Matlab
Why Does X[Y] Join of Data.Tables Not Allow a Full Outer Join, or a Left Join
Call Apply-Like Function on Each Row of Dataframe With Multiple Arguments from Each Row
Subscript Out of Bounds - General Definition and Solution
Get Specific Object from Rdata File
Finding Running Maximum by Group
How Does One Reorder Columns in a Data Frame
Plotting Contours on an Irregular Grid
Scale a Series Between Two Points