Using Lapply and Read.CSV on Multiple Files (In R)

R - apply function on two files in folders with for loop or lapply and save results in one dataframe

Try this solution :

Get all the folders using list.dirs.
For each folder read the "alpha" and "beta" files and return a 3 column tibble back with alpha, beta and alphabeta values.
Bind all the dataframes with and id column to know from which folder each value is coming.

all_folders <- list.dirs('Data/', recursive = FALSE, full.names = TRUE)

result <- purrr::map_df(all_folders, function(x) {
  all_Files <- list.files(x, full.names = TRUE, pattern = 'alpha|beta')
  df1 <- read.csv(all_Files[1])
  df2 <- read.csv(all_Files[2])
  tibble::tibble(alpha = df1$mean, beta = df2$mean, alphabeta = alpha/beta)
}, .id = "id")

Skipping last N rows with lapply and then read.csv

Something like this should put you on the right track. This reads the files first, then removes last 5 rows, and finally binds them together. Would also suggest not to use variable names that might conflict with function names. files and c are functions in base R. Here, I am using all_files instead of files. -

all_files <- list.files(path = "./savedfiles", full.names = TRUE)

do.call(rbind, # assuming columns match 1:1; use dplyr::bind_rows() if not 1:1
  lapply(all_files, function(x) {
    head(read.csv(x, header = T, stringsAsFactors = F), -5) # change as per needs
  })
)

Using lapply variable in read.csv

Generally, it often makes more sense to apply a function to the list elements and then to return a list when using lapply, where your variables are stored and can be named. Example (edit: use split to process files together):

files <- list.files(path= "I:/Results/", pattern = "site_[abcd]_.*csv", full.names = TRUE)
files <- split(files, gsub(".*site_([abcd]).*", "\\1", files))
processFiles <- function(x){
    all <- read.csv(x[grep("_all.csv", x)])
    rsid <- read.csv(x[grep("_rsid.csv", x)])
    tbl <- read.csv(x[grep("_tbl.csv", x)])
    # do more stuff, generate df, return(df)
}
res <- lapply(files, processFiles)

How to load multiple csv files into seperate objects(dataframes) in R based on filename?

Solution for anyone curious...

files <- list.files(pattern = ".*csv")

for(file in 1:length(files)) { 
  file_name <- paste(c("file00",file), collapse = " ")
  file_name <- gsub(" ", "", file_name, fixed = TRUE)
  ex_file_name <- paste(c("exfile00",file), collapse = " ")
  ex_file_name <- gsub(" ", "", ex_file_name, fixed = TRUE)
  
  file_object <- read.csv(file = paste(file_name, ".csv", sep=""),fileEncoding="UTF-8-BOM")
  exfile_object <- read.csv(file = paste(ex_file_name, ".csv", sep=""),fileEncoding="UTF-8-BOM")
  }

Essentially build the filename within the loop, then passs it to the readcsv function on each iteration.

How to import multiple .csv files at once?

Something like the following should result in each data frame as a separate element in a single list:

temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)

This assumes that you have those CSVs in a single directory--your current working directory--and that all of them have the lower-case extension .csv.

If you then want to combine those data frames into a single data frame, see the solutions in other answers using things like do.call(rbind,...), dplyr::bind_rows() or data.table::rbindlist().

If you really want each data frame in a separate object, even though that's often inadvisable, you could do the following with assign:

temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))

Or, without assign, and to demonstrate (1) how the file name can be cleaned up and (2) show how to use list2env, you can try the following:

temp = list.files(pattern="*.csv")
list2env(
  lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))), 
         read.csv), envir = .GlobalEnv)

But again, it's often better to leave them in a single list.

R: Errors and problems executing functions using lapply / map -- won't read input list or write output files

To get the second part to run, need to use mapply like in this example.

###import .tsv of PCR results, formatting, natural sort. Export cleaned file as .csv.
###import .tsv of Replicates file, split $Samples column in two and re-combine in Replicates.
###Export modified Replicates tibble to new .csv file.

###environment setup; change folder accordingly. Install tidyverse if needed.

setwd("C:/Users/asmit/Desktop/pratice_files")
#install.packages(tidyverse)

library(tidyverse)

###import .tsv of PCR results, formatting, natural sort. Export cleaned file as .csv.
singlet_files <- list.files(path = ".", pattern = "[^replicates]\\.tsv")

tibble_singlet <- function(x) { ###function to create tibble from singlet files
  cleanup_tibble <- as_tibble(read_tsv(x, col_names = TRUE, skip = 1))
}

singlet_cleanup <- function(x) { ##function to clean singlet files
  new_file <- str_replace(x, "(.*).tsv", "\\1_cleaned.csv")
  tibble_singlet(x) %>%
    select("Pos", "Name", "Cp", "Concentration") %>%
    .[str_order(.$Pos, numeric = TRUE),] %>%
    write_csv(file = new_file)
}

lapply(singlet_files, singlet_cleanup) ##run (singlet_cleanup) on files in singlet_files
#> Rows: 96 Columns: 8
#> -- Column specification --------------------------------------------------------
#> Delimiter: "\t"
#> chr (3): Pos, Name, Status
#> dbl (4): Color, Cp, Concentration, Standard
#> lgl (1): Include
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 96 Columns: 8
#> -- Column specification --------------------------------------------------------
#> Delimiter: "\t"
#> chr (3): Pos, Name, Status
#> dbl (4): Color, Cp, Concentration, Standard
#> lgl (1): Include
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 96 Columns: 8
#> -- Column specification --------------------------------------------------------
#> Delimiter: "\t"
#> chr (3): Pos, Name, Status
#> dbl (4): Color, Cp, Concentration, Standard
#> lgl (1): Include
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> [[1]]
#> # A tibble: 96 x 4
#>    Pos   Name     Cp Concentration
#>    <chr> <chr> <dbl>         <dbl>
#>  1 A1    1E6    17.2    894000    
#>  2 A2    1E6    17.2    877000    
#>  3 A3    23     NA          NA    
#>  4 A4    23     NA          NA    
#>  5 A5    79     35.1         8.73 
#>  6 A6    79     36.2         4.26 
#>  7 A7    144    35.7         6.09 
#>  8 A8    144    36.7         3.19 
#>  9 A9    229    39.2         0.633
#> 10 A10   229    37.7         1.64 
#> # ... with 86 more rows
#> 
#> [[2]]
#> # A tibble: 96 x 4
#>    Pos   Name     Cp Concentration
#>    <chr> <chr> <dbl>         <dbl>
#>  1 A1    1E6    19.1     769000   
#>  2 A2    1E6    18.9     906000   
#>  3 A3    319    33.5        103   
#>  4 A4    319    33.8         86.3 
#>  5 A5    370    35.8         23.4 
#>  6 A6    370    40            1.79
#>  7 A7    415    35.6         27.2 
#>  8 A8    415    36.8         13   
#>  9 A9    486    34.5         55.3 
#> 10 A10   486    36.0         21.1 
#> # ... with 86 more rows
#> 
#> [[3]]
#> # A tibble: 96 x 4
#>    Pos   Name     Cp Concentration
#>    <chr> <chr> <dbl>         <dbl>
#>  1 A1    1E6    18.2     568000   
#>  2 A2    1E6    17.0    1210000   
#>  3 A3    23     35.7         12.3 
#>  4 A4    23     35.9         10.9 
#>  5 A5    67     35.6         13.3 
#>  6 A6    67     35.5         14.5 
#>  7 A7    129    38.3          2.6 
#>  8 A8    129    NA           NA   
#>  9 A9    172    NA           NA   
#> 10 A10   172    37.3          4.69
#> # ... with 86 more rows
###import .tsv of Replicates file, split $Samples column in two and re-combine in Replicates.
singlet_cleaned <- list.files(path = ".", pattern = "[_cleaned]\\.csv")
matching_pair_files <- list.files(path = ".", pattern = "[replicates]\\.tsv")

cleaned_tibble <- function(y) { ##function to read cleaned .csv files as tibble
  Pos_tibble <- as_tibble(read_csv(y, col_names = TRUE)) 
}

match <- function(m){ ##function to make tibble of replicate file
  match_tibble <- as_tibble(read_tsv(m, col_names = TRUE, skip = 1))
}

merged <- function(m,y){ ##function to merge match tibble with specific column of cleaned_tibble tibble
  organ <- regmatches(m, regexpr("(Liver|Lung|Kidney|Spleen)", m))
  output_file <- str_replace(m, "(.*)_replicates.tsv", "\\1_final.csv")
  match(m) %>%
    mutate("R1" = gsub(x = .$Samples, pattern = "^(.*),.*", replacement = "\\1")) %>%
    mutate("R2" = gsub(x = .$Samples, pattern = ".*,\\s(.*)", replacement = "\\1")) %>%
    pivot_longer(cols = c("R1", "R2"), names_to ="Well Pairs", values_to = "Wells") %>%
    select("MeanCp", "STD Cp", "Mean conc", "STD conc", "Wells") %>%
    relocate("Wells", 1) %>%
    right_join((cleaned_tibble(y)), by = c("Wells"="Pos")) %>%
    .[str_order(.$Wells, numeric = TRUE),] %>%
    select("Name", "MeanCp", "STD Cp", "Mean conc", "STD conc") %>%
    distinct(Name, .keep_all = TRUE) %>%
    add_column(Organ = organ) %>%
    write_csv(file = output_file) ###Export modified Replicates tibble to new .csv file.
}

mapply(merged, matching_pair_files, singlet_cleaned, SIMPLIFY = FALSE)
#> Rows: 47 Columns: 5
#> -- Column specification --------------------------------------------------------
#> Delimiter: "\t"
#> chr (1): Samples
#> dbl (4): MeanCp, STD Cp, Mean conc, STD conc
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 96 Columns: 4
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr (2): Pos, Name
#> dbl (2): Cp, Concentration
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 46 Columns: 5
#> -- Column specification --------------------------------------------------------
#> Delimiter: "\t"
#> chr (1): Samples
#> dbl (4): MeanCp, STD Cp, Mean conc, STD conc
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 48 Columns: 6
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr (2): Name, Organ
#> dbl (4): MeanCp, STD Cp, Mean conc, STD conc
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Error: Join columns must be present in data.
#> x Problem with `Pos`.

Though, I don't know what this final error message at the bottom is about... my files all have the output I expect. I'm... not going to worry about it for the time being.

^{Created on 2021-09-22 by the reprex package (v2.0.1)}