Which is the best method to apply a script repetitively to n .csv files in R?
I find a for loop and lists is well enough for stuff like this. Once you have a working set of code it's easy enough to move from a loop into a function which can be sapply
ied or similar, but that kind of vectorization is idiosyncratic anyway and probably not useful outside of private one-liners.
You probably want to avoid assigning to multiple objects with different names in the workspace (this a FAQ which usually comes up as "how do I assign() . . .").
Please beware my untested code.
A vector of file names, and a list with a named element for each file.
files <- c("AA01.csv", "AA02.csv")
lst <- vector("list", length(files))
names(lst) <- files
Loop over each file.
library(timeSeries)
for (i in 1:length(files)) {
## read strings as character
tmp <- read.csv(files[i], stringsAsFactors = FALSE)
## convert to 'timeDate'
tmp$tfrm <- timeDate(paste(tmp$cdt, tmp$ctm),format ="%Y/%m/%d %H:%M:%S"))
## create timeSeries object
obj <- timeSeries(as.matrix(tmp$Value), tmp$tfrm)
## store object in the list, by name
lst[[files[i]]] <- as.xts(obj)
}
## clean up
rm(tmp, files, obj)
Now all the read objects are in lst
, but you'll want to test that the file is available, that it was read correctly, and you may want to modify the names to be more sensible than just the file name.
Print out the first object by name index from the list:
lst[[files[1]]]
R for loop for slide functions for each of multiple text files
You can use list.files
to only pull files with that pattern and then extract the patient number for use in saving the results. Something like this should work, assuming the text files are in the working directory and you want to pull all .txt files in that directory
subjectFiles <- list.files(pattern = '*.txt')
for (this_file in subjectFiles) {
ID <- gsub('.txt', '', this_file)
my.data <- read.table(this_file, header = T, sep = "\t")
#Lead scores for X and Y
require(DataCombine)
LeadX<-slide(my.data, Var="X", slideBy=-1)
LeadXY<-slide(LeadX, Var="Y", slideBy=-1)
LeadXY<-na.omit(LeadY) #Delete the first row of null lead values
#Difference scores for X and Y
LeadXY$DiffX<-(LeadXY$"X-1"-LeadY$"X")
LeadXY$DiffY<-(LeadXY$"Y-1"-LeadY$"Y")
#Rate of Change for X
LeadXY<-slide(LeadXY, Var="X", slideBy=1) #Create column of lagged X scores (X1)
write.table(LeadXY, paste0("/Users/mstoehr/", ID, "LDscores.txt"),sep="\t") #Save
LeadXY$velocityX<-(LeadXY$"X1"-LeadY$"X-1")/2 #Calculate rate of change of X
write.table(LeadY, paste0("/Users/mstoehr/", ID, "LDscores.txt"),sep="\t") #Save
}
R: Loading xts series from multiple files into a single block
I often use a construct like this, which avoids explicit loop construction.
The strategy is to first read the files into a list of data.frames, and to then rbind
together the elements of that list into a single data.frame. You can presumably adapt the same logic to your situation.
filenames <- c("a.csv", "b.csv", "c.csv")
l <- lapply(filenames, read.csv)
do.call("rbind", l)
Related Topics
Referring to Data.Table Columns by Names Saved in Variables
How to Change the Color Value of Just One Value in Ggplot2's Scale_Fill_Brewer
Which Is the Best Method to Apply a Script Repetitively to N .CSV Files in R
How to Access the Help/Documentation .Rd Source Files in R
Increase Resolution of Color Scale for Values Close to Zero
Colour Points in a Plot Differently Depending on a Vector of Values
How to Show the Y Value on Tooltip While Hover in Ggplot2
Splitting a File Name into Name,Extension
R Draws Plots with Rectangles Instead of Text
Concatenate Several Columns to Comma Separated Strings by Group
Embedded Nul in String' Error When Importing CSV with Fread
Dplyr on Data.Table, am I Really Using Data.Table
Assigning Dates to Fiscal Year
In R, Use Gsub to Remove All Punctuation Except Period
Python's Xrange Alternative for R or How to Loop Over Large Dataset Lazilly
Select First Element of Nested List
Check Whether Values in One Data Frame Column Exist in a Second Data Frame