Using Lapply and Read.CSV on Multiple Files (In R)

R - apply function on two files in folders with for loop or lapply and save results in one dataframe

Try this solution :

  1. Get all the folders using list.dirs.

  2. For each folder read the "alpha" and "beta" files and return a 3 column tibble back with alpha, beta and alphabeta values.

  3. Bind all the dataframes with and id column to know from which folder each value is coming.

all_folders <- list.dirs('Data/', recursive = FALSE, full.names = TRUE)

result <- purrr::map_df(all_folders, function(x) {
all_Files <- list.files(x, full.names = TRUE, pattern = 'alpha|beta')
df1 <- read.csv(all_Files[1])
df2 <- read.csv(all_Files[2])
tibble::tibble(alpha = df1$mean, beta = df2$mean, alphabeta = alpha/beta)
}, .id = "id")

Skipping last N rows with lapply and then read.csv

Something like this should put you on the right track. This reads the files first, then removes last 5 rows, and finally binds them together. Would also suggest not to use variable names that might conflict with function names. files and c are functions in base R. Here, I am using all_files instead of files. -

all_files <- list.files(path = "./savedfiles", full.names = TRUE)

do.call(rbind, # assuming columns match 1:1; use dplyr::bind_rows() if not 1:1
lapply(all_files, function(x) {
head(read.csv(x, header = T, stringsAsFactors = F), -5) # change as per needs
})
)

Using lapply variable in read.csv

Generally, it often makes more sense to apply a function to the list elements and then to return a list when using lapply, where your variables are stored and can be named. Example (edit: use split to process files together):

files <- list.files(path= "I:/Results/", pattern = "site_[abcd]_.*csv", full.names = TRUE)
files <- split(files, gsub(".*site_([abcd]).*", "\\1", files))
processFiles <- function(x){
all <- read.csv(x[grep("_all.csv", x)])
rsid <- read.csv(x[grep("_rsid.csv", x)])
tbl <- read.csv(x[grep("_tbl.csv", x)])
# do more stuff, generate df, return(df)
}
res <- lapply(files, processFiles)

How to load multiple csv files into seperate objects(dataframes) in R based on filename?

Solution for anyone curious...

files <- list.files(pattern = ".*csv")

for(file in 1:length(files)) {
file_name <- paste(c("file00",file), collapse = " ")
file_name <- gsub(" ", "", file_name, fixed = TRUE)
ex_file_name <- paste(c("exfile00",file), collapse = " ")
ex_file_name <- gsub(" ", "", ex_file_name, fixed = TRUE)

file_object <- read.csv(file = paste(file_name, ".csv", sep=""),fileEncoding="UTF-8-BOM")
exfile_object <- read.csv(file = paste(ex_file_name, ".csv", sep=""),fileEncoding="UTF-8-BOM")
}

Essentially build the filename within the loop, then passs it to the readcsv function on each iteration.

How to import multiple .csv files at once?

Something like the following should result in each data frame as a separate element in a single list:

temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)

This assumes that you have those CSVs in a single directory--your current working directory--and that all of them have the lower-case extension .csv.

If you then want to combine those data frames into a single data frame, see the solutions in other answers using things like do.call(rbind,...), dplyr::bind_rows() or data.table::rbindlist().

If you really want each data frame in a separate object, even though that's often inadvisable, you could do the following with assign:

temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))

Or, without assign, and to demonstrate (1) how the file name can be cleaned up and (2) show how to use list2env, you can try the following:

temp = list.files(pattern="*.csv")
list2env(
lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))),
read.csv), envir = .GlobalEnv)

But again, it's often better to leave them in a single list.

R: Errors and problems executing functions using lapply / map -- won't read input list or write output files

To get the second part to run, need to use mapply like in this example.

###import .tsv of PCR results, formatting, natural sort. Export cleaned file as .csv.
###import .tsv of Replicates file, split $Samples column in two and re-combine in Replicates.
###Export modified Replicates tibble to new .csv file.

###environment setup; change folder accordingly. Install tidyverse if needed.

setwd("C:/Users/asmit/Desktop/pratice_files")
#install.packages(tidyverse)

library(tidyverse)

###import .tsv of PCR results, formatting, natural sort. Export cleaned file as .csv.
singlet_files <- list.files(path = ".", pattern = "[^replicates]\\.tsv")

tibble_singlet <- function(x) { ###function to create tibble from singlet files
cleanup_tibble <- as_tibble(read_tsv(x, col_names = TRUE, skip = 1))
}

singlet_cleanup <- function(x) { ##function to clean singlet files
new_file <- str_replace(x, "(.*).tsv", "\\1_cleaned.csv")
tibble_singlet(x) %>%
select("Pos", "Name", "Cp", "Concentration") %>%
.[str_order(.$Pos, numeric = TRUE),] %>%
write_csv(file = new_file)
}

lapply(singlet_files, singlet_cleanup) ##run (singlet_cleanup) on files in singlet_files
#> Rows: 96 Columns: 8
#> -- Column specification --------------------------------------------------------
#> Delimiter: "\t"
#> chr (3): Pos, Name, Status
#> dbl (4): Color, Cp, Concentration, Standard
#> lgl (1): Include
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 96 Columns: 8
#> -- Column specification --------------------------------------------------------
#> Delimiter: "\t"
#> chr (3): Pos, Name, Status
#> dbl (4): Color, Cp, Concentration, Standard
#> lgl (1): Include
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 96 Columns: 8
#> -- Column specification --------------------------------------------------------
#> Delimiter: "\t"
#> chr (3): Pos, Name, Status
#> dbl (4): Color, Cp, Concentration, Standard
#> lgl (1): Include
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> [[1]]
#> # A tibble: 96 x 4
#> Pos Name Cp Concentration
#> <chr> <chr> <dbl> <dbl>
#> 1 A1 1E6 17.2 894000
#> 2 A2 1E6 17.2 877000
#> 3 A3 23 NA NA
#> 4 A4 23 NA NA
#> 5 A5 79 35.1 8.73
#> 6 A6 79 36.2 4.26
#> 7 A7 144 35.7 6.09
#> 8 A8 144 36.7 3.19
#> 9 A9 229 39.2 0.633
#> 10 A10 229 37.7 1.64
#> # ... with 86 more rows
#>
#> [[2]]
#> # A tibble: 96 x 4
#> Pos Name Cp Concentration
#> <chr> <chr> <dbl> <dbl>
#> 1 A1 1E6 19.1 769000
#> 2 A2 1E6 18.9 906000
#> 3 A3 319 33.5 103
#> 4 A4 319 33.8 86.3
#> 5 A5 370 35.8 23.4
#> 6 A6 370 40 1.79
#> 7 A7 415 35.6 27.2
#> 8 A8 415 36.8 13
#> 9 A9 486 34.5 55.3
#> 10 A10 486 36.0 21.1
#> # ... with 86 more rows
#>
#> [[3]]
#> # A tibble: 96 x 4
#> Pos Name Cp Concentration
#> <chr> <chr> <dbl> <dbl>
#> 1 A1 1E6 18.2 568000
#> 2 A2 1E6 17.0 1210000
#> 3 A3 23 35.7 12.3
#> 4 A4 23 35.9 10.9
#> 5 A5 67 35.6 13.3
#> 6 A6 67 35.5 14.5
#> 7 A7 129 38.3 2.6
#> 8 A8 129 NA NA
#> 9 A9 172 NA NA
#> 10 A10 172 37.3 4.69
#> # ... with 86 more rows
###import .tsv of Replicates file, split $Samples column in two and re-combine in Replicates.
singlet_cleaned <- list.files(path = ".", pattern = "[_cleaned]\\.csv")
matching_pair_files <- list.files(path = ".", pattern = "[replicates]\\.tsv")

cleaned_tibble <- function(y) { ##function to read cleaned .csv files as tibble
Pos_tibble <- as_tibble(read_csv(y, col_names = TRUE))
}

match <- function(m){ ##function to make tibble of replicate file
match_tibble <- as_tibble(read_tsv(m, col_names = TRUE, skip = 1))
}

merged <- function(m,y){ ##function to merge match tibble with specific column of cleaned_tibble tibble
organ <- regmatches(m, regexpr("(Liver|Lung|Kidney|Spleen)", m))
output_file <- str_replace(m, "(.*)_replicates.tsv", "\\1_final.csv")
match(m) %>%
mutate("R1" = gsub(x = .$Samples, pattern = "^(.*),.*", replacement = "\\1")) %>%
mutate("R2" = gsub(x = .$Samples, pattern = ".*,\\s(.*)", replacement = "\\1")) %>%
pivot_longer(cols = c("R1", "R2"), names_to ="Well Pairs", values_to = "Wells") %>%
select("MeanCp", "STD Cp", "Mean conc", "STD conc", "Wells") %>%
relocate("Wells", 1) %>%
right_join((cleaned_tibble(y)), by = c("Wells"="Pos")) %>%
.[str_order(.$Wells, numeric = TRUE),] %>%
select("Name", "MeanCp", "STD Cp", "Mean conc", "STD conc") %>%
distinct(Name, .keep_all = TRUE) %>%
add_column(Organ = organ) %>%
write_csv(file = output_file) ###Export modified Replicates tibble to new .csv file.
}

mapply(merged, matching_pair_files, singlet_cleaned, SIMPLIFY = FALSE)
#> Rows: 47 Columns: 5
#> -- Column specification --------------------------------------------------------
#> Delimiter: "\t"
#> chr (1): Samples
#> dbl (4): MeanCp, STD Cp, Mean conc, STD conc
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 96 Columns: 4
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr (2): Pos, Name
#> dbl (2): Cp, Concentration
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 46 Columns: 5
#> -- Column specification --------------------------------------------------------
#> Delimiter: "\t"
#> chr (1): Samples
#> dbl (4): MeanCp, STD Cp, Mean conc, STD conc
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 48 Columns: 6
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr (2): Name, Organ
#> dbl (4): MeanCp, STD Cp, Mean conc, STD conc
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Error: Join columns must be present in data.
#> x Problem with `Pos`.

Though, I don't know what this final error message at the bottom is about... my files all have the output I expect. I'm... not going to worry about it for the time being.

Created on 2021-09-22 by the reprex package (v2.0.1)



Related Topics



Leave a reply



Submit