Which Is the Best Method to Apply a Script Repetitively to N .CSV Files in R

Which is the best method to apply a script repetitively to n .csv files in R?

I find a for loop and lists is well enough for stuff like this. Once you have a working set of code it's easy enough to move from a loop into a function which can be sapplyied or similar, but that kind of vectorization is idiosyncratic anyway and probably not useful outside of private one-liners.

You probably want to avoid assigning to multiple objects with different names in the workspace (this a FAQ which usually comes up as "how do I assign() . . .").

Please beware my untested code.

A vector of file names, and a list with a named element for each file.

files <- c("AA01.csv", "AA02.csv")
lst <- vector("list", length(files))
names(lst) <- files

Loop over each file.

library(timeSeries)

for (i in 1:length(files)) {
## read strings as character
tmp <- read.csv(files[i], stringsAsFactors = FALSE)
## convert to 'timeDate'
tmp$tfrm <- timeDate(paste(tmp$cdt, tmp$ctm),format ="%Y/%m/%d %H:%M:%S"))
## create timeSeries object
obj <- timeSeries(as.matrix(tmp$Value), tmp$tfrm)
## store object in the list, by name
lst[[files[i]]] <- as.xts(obj)
}

## clean up
rm(tmp, files, obj)

Now all the read objects are in lst, but you'll want to test that the file is available, that it was read correctly, and you may want to modify the names to be more sensible than just the file name.

Print out the first object by name index from the list:

lst[[files[1]]]

R for loop for slide functions for each of multiple text files

You can use list.files to only pull files with that pattern and then extract the patient number for use in saving the results. Something like this should work, assuming the text files are in the working directory and you want to pull all .txt files in that directory

subjectFiles <- list.files(pattern = '*.txt')
for (this_file in subjectFiles) {
ID <- gsub('.txt', '', this_file)

my.data <- read.table(this_file, header = T, sep = "\t")

#Lead scores for X and Y
require(DataCombine)
LeadX<-slide(my.data, Var="X", slideBy=-1)
LeadXY<-slide(LeadX, Var="Y", slideBy=-1)
LeadXY<-na.omit(LeadY) #Delete the first row of null lead values

#Difference scores for X and Y
LeadXY$DiffX<-(LeadXY$"X-1"-LeadY$"X")
LeadXY$DiffY<-(LeadXY$"Y-1"-LeadY$"Y")

#Rate of Change for X
LeadXY<-slide(LeadXY, Var="X", slideBy=1) #Create column of lagged X scores (X1)
write.table(LeadXY, paste0("/Users/mstoehr/", ID, "LDscores.txt"),sep="\t") #Save
LeadXY$velocityX<-(LeadXY$"X1"-LeadY$"X-1")/2 #Calculate rate of change of X
write.table(LeadY, paste0("/Users/mstoehr/", ID, "LDscores.txt"),sep="\t") #Save

}

R: Loading xts series from multiple files into a single block

I often use a construct like this, which avoids explicit loop construction.

The strategy is to first read the files into a list of data.frames, and to then rbind together the elements of that list into a single data.frame. You can presumably adapt the same logic to your situation.

filenames <- c("a.csv", "b.csv", "c.csv")
l <- lapply(filenames, read.csv)
do.call("rbind", l)


Related Topics



Leave a reply



Submit