Loop in R to Read Many Files

Loop in R to read many files

Sys.glob() is another possibility - it's sole purpose is globbing or wildcard expansion.

dataFiles <- lapply(Sys.glob("data*.csv"), read.csv)

That will read all the files of the form data[x].csv into list dataFiles, where [x] is nothing or anything.

[Note this is a different pattern to that in @Joshua's Answer. There, list.files() takes a regular expression, whereas Sys.glob() just uses standard wildcards; which wildcards can be used is system dependent, details can be used can be found on the help page ?Sys.glob.]

How to read many files using a loop in R

    setwd("WHATEVER-YOUR-WD-PATH-IS")

#Create a list of all file names in directory
#Save File names to a variable

filenames <- list.files(pattern="factory+.*csv")

## Get names without ".CSV" and store in "names"

names <- substr(filenames, 1, 10)

## Read in all data frames using a loop

for(i in names){
filepath <- file.path(paste(i,".csv",sep=""))
assign(i, read.csv(filepath, sep = ",", header=FALSE, skip=1))
}

Loop function in r to read and save multiple data files

To convert this to a for loop, first get a list of the .txt files in your working directory:

myfiles <- list.files(pattern="*.txt")

Then loop through each file, reading, joining with df1, and writing with minor modifications to your existing code:

for (file in myfiles) {
df2 <- read.table(file, sep="\t", stringsAsFactors=FALSE, header=TRUE)
lst <- list(data.frame(df1), data.frame(df2))
df3 <- reduce(lst, full_join, by = "ID") %>% replace(., is.na(.), 0);
data.table::fwrite(df3, file=paste0("output_", file), quote = F, sep = "\t", row.names = F)
}

For loop to read multiple csv files in R from different directories

You are trying to use string interpolation, which does not exist in R.

Look at the output of this:

files <- c(21,22,29,30,34,65,66,69,70,74)

for(i in files) { # Loop over character vector
print("F:/Fish[i]/Fish[i].csv")
}

Output:

[1] "F:/Fish[i]/Fish[i].csv"
[1] "F:/Fish[i]/Fish[i].csv"
[1] "F:/Fish[i]/Fish[i].csv"
[1] "F:/Fish[i]/Fish[i].csv"
[1] "F:/Fish[i]/Fish[i].csv"
[1] "F:/Fish[i]/Fish[i].csv"
[1] "F:/Fish[i]/Fish[i].csv"
[1] "F:/Fish[i]/Fish[i].csv"
[1] "F:/Fish[i]/Fish[i].csv"
[1] "F:/Fish[i]/Fish[i].csv"

Additionally, what is F? If it is a list, you will need to use double square brackets:

for(i in files) {                                             # Loop over character vector
F[[i]] <- read.csv(paste0("F:/Fish",i,"/Fish", i, ".csv"))
}

Reading several large files in a loop

It is better using a list for the loop output like this. You can create a vector to save the dirs where the files are stored (I did on myvec and you can change 1:3 to 1:n where n can be a larger number). With that done, all the results from loop will be in List. Here the code:

library(fst)    
#Create empty list
List <- list()
#Vector
myvec <- paste0("C:/data",1:3,".fst")
#Loop
for(i in 1:length(myvec))
{
List[[i]] <- read_fst(myvec[i], c(1:2), from = 1, to = 1000)
}

Read multiple .tsv files in a loop in R

The function assign is our friend here:

years <- 2008:2012
variable_names <- paste0("T", years)

for(i in variable_names){
filename <- paste0('./', i, ".tsv")
dat <- read.table(file = filename, sep = '\t', header = TRUE)
assign(i, dat)
}

While I can't test this exact code without access to your files, here's what I did to test it:

years <- 2008:2012
variable_names <- paste0("T", years)

for(i in variable_names){
filename <- paste0('./', i, ".tsv")
dat <- filename
assign(i, dat)
}

which produces five new objects in my global environment, T2008 through T2012, with the expected values: "./T2008.tsv", "./T2009.tsv", etc.

Simplify for loop: read multiple files and remove specific data

Don't perform any complex data manipulations within loop.

In loop: load and melt all data (you won't be loosing much memory as you're removing only ~1/365 of the data).

Then outside the loop: using data.table object filter (remove day 60) and modify your data ("day" column).

# Arguments
yearAll <- 1980:2017
yearLp <- seq(1980, 2016, 4)
# Libraries
library(data.table)
library(foreach)
# Load data
# It's possible to parallelize loop using %dopar%
result <- foreach(i = yearAll, .combine = rbind) %do% {
melt(fread(paste0("Precp_", i, ".csv")),
c("ID1", "ID2", "ID3", "ID4", "year"))
}
# Modify data
result <- result[!(year %in% yearLp & variable == "d_60")]
result[, day := as.numeric(sub("d_", "", variable))]
result[year %in% yearLp & day >= 61, day := day - 1]


Related Topics



Leave a reply



Submit