How to Turn the Filename into a Variable When Reading Multiple CSVS into R

How can I turn the filename into a variable when reading multiple csvs into R

You can create the object from lapply first.

Lapply <- lapply(myFiles, read.csv, header=TRUE))
names(Lapply) <- myFiles
for(i in myFiles)
Lapply[[i]]$Source = i
do.call(rbind, Lapply)

How can I turn a part of the filename into a variable when reading multiple text files?

Update: Although the initial answer is correct, the same goal can be achieved in fewer steps by using sapply with simplify=FALSE instead of lapply because sapply automatically assigns the filenames to the elements in the list:

library(data.table)

files <- list.files("pathname", pattern="*.TXT")
file.list <- sapply(files, read.table, simplify=FALSE)
masterfilesales <- rbindlist(file.list, idcol="id")[, id := substr(id,1,4)]

Old answer: To achieve what you want, you can utilize a combination of the setattr function and the idcol pararmeter of the rbindlist function from the data.table-package as follows:

library(data.table)

files <- list.files("pathname", pattern="*.TXT")
file.list <- lapply(files, read.table)
setattr(file.list, "names", files)
masterfilesales <- rbindlist(file.list, idcol="id")[, id := substr(id,1,4)]

Alternatively, you can set the filenames in base R with:

attr(file.list, "names") <- files

or:

names(file.list) <- files

and bind them together with bind_rows from the dplyr package (which has also an .id parameter to create an id-column):

masterfilesales <- bind_rows(file.list, .id="id") %>% mutate(id = substr(id,1,4))

How to import multiple .csv files at once?

Something like the following should result in each data frame as a separate element in a single list:

temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)

This assumes that you have those CSVs in a single directory--your current working directory--and that all of them have the lower-case extension .csv.

If you then want to combine those data frames into a single data frame, see the solutions in other answers using things like do.call(rbind,...), dplyr::bind_rows() or data.table::rbindlist().

If you really want each data frame in a separate object, even though that's often inadvisable, you could do the following with assign:

temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))

Or, without assign, and to demonstrate (1) how the file name can be cleaned up and (2) show how to use list2env, you can try the following:

temp = list.files(pattern="*.csv")
list2env(
lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))),
read.csv), envir = .GlobalEnv)

But again, it's often better to leave them in a single list.

How to load multiple csv files into seperate objects(dataframes) in R based on filename?

Solution for anyone curious...

files <- list.files(pattern = ".*csv")

for(file in 1:length(files)) {
file_name <- paste(c("file00",file), collapse = " ")
file_name <- gsub(" ", "", file_name, fixed = TRUE)
ex_file_name <- paste(c("exfile00",file), collapse = " ")
ex_file_name <- gsub(" ", "", ex_file_name, fixed = TRUE)

file_object <- read.csv(file = paste(file_name, ".csv", sep=""),fileEncoding="UTF-8-BOM")
exfile_object <- read.csv(file = paste(ex_file_name, ".csv", sep=""),fileEncoding="UTF-8-BOM")
}

Essentially build the filename within the loop, then passs it to the readcsv function on each iteration.

Read multiple CSV files into separate data frames

Quick draft, untested:

  1. Use list.files() aka dir() to dynamically generate your list of files.

  2. This returns a vector, just run along the vector in a for loop.

  3. Read the i-th file, then use assign() to place the content into a new variable file_i

That should do the trick for you.

Importing multiple .csv files into R and adding a new column with file name

This should do it:

file_names <- dir("~/Desktop/data") 
df <- do.call(rbind, lapply(file_names, function(x) cbind(read.csv(x), name=strsplit(x,'\\.')[[1]][1])))

Add filename column to table as multiple files are read and bound

I generally use the following approach, based on dplyr/tidyr:

data = tibble(File = files) %>%
extract(File, "Site", "([A-Z]{2}-[A-Za-z0-9]{3})", remove = FALSE) %>%
mutate(Data = lapply(File, read_csv)) %>%
unnest(Data) %>%
select(-File)

Load in multiple CSV files and add suffix to column names in R

Suppose we have the files generated reproducibly in the Note at the end.

Then we get the file names in fnames and Map a function Read over them to read in each file and fix the names returning the fixed up data frame.

fnames <- Sys.glob("data*.csv")

Read <- function(f) {
df <- read.csv(f)
names(df)[-1] <- paste0(names(df[-1]), "_", sub(".csv$", "", basename(f)))
df
}
L <- Map(Read, fnames)

str(L)

giving this named list:

List of 3
$ data1.csv:'data.frame': 2 obs. of 3 variables:
..$ subject_id: int [1:2] 1 2
..$ var1_data1: int [1:2] 55 55
..$ var2_data1: int [1:2] 57 57
$ data2.csv:'data.frame': 2 obs. of 3 variables:
..$ subject_id: int [1:2] 1 2
..$ var1_data2: int [1:2] 55 55
..$ var2_data2: int [1:2] 57 57
$ data3.csv:'data.frame': 2 obs. of 3 variables:
..$ subject_id: int [1:2] 1 2
..$ var1_data3: int [1:2] 55 55
..$ var2_data3: int [1:2] 57 57

Note

Lines <- "subject_id var1 var2
1 55 57
2 55 57"
data1 <- data2 <- data3 <- read.table(text = Lines, header = TRUE)
for(f in c("data1", "data2", "data3")) write.csv(get(f), paste0(f, ".csv"), row.names = FALSE, quote = FALSE)


Related Topics



Leave a reply



Submit