When importing CSV into R how to generate column with name of the CSV?
You have already done all the hard work. With a fairly small modification this should be straight-forward.
The logic is:
- Create a small helper function that reads an individual csv and adds a column with the file name.
- Call this helper function in llply()
The following should work:
read_csv_filename <- function(filename){
ret <- read.csv(filename)
ret$Source <- filename #EDIT
ret
}
import.list <- ldply(filenames, read_csv_filename)
Note that I have proposed another small improvement to your code: read.csv() returns a data.frame - this means you can use ldply() rather than llply().
Extracting column name from CSV filename during import
This would create a long table with date
as 4th column -
library(tidyverse)
data <- list.files(path = "...", pattern = "*.csv", full.names = TRUE) %>%
sapply(read_csv, simplify = FALSE) %>%
imap_dfr(~.x %>%
mutate(date = sub('.*(\\d{4}-\\d{2}-\\d{2}).*', '\\1', basename(.y))))
sapply
with simplify = FALSE
would create a list with names of the list as file name. Using imap_dfr
we combine all the data in one dataframe and create a new column date
extract the date from the list name.
Read in CSV files and Add a Column with File name
Here's a (mostly) tidyverse
alternative that avoids looping:
library(tidyverse)
csv_names <- list.files(path = "path/", # set the path to your folder with csv files
pattern = "*.csv", # select all csv files in the folder
full.names = T) # output full file names (with path)
# csv_names <- c("file_1_october.csv", "file_2_november.csv")
csv_names2 <- data.frame(month = csv_names,
id = as.character(1:length(csv_names))) # id for joining
data <- csv_names %>%
lapply(read_csv) %>% # read all the files at once
bind_rows(.id = "id") %>% # bind all tables into one object, and give id for each
left_join(csv_names2) # join month column created earlier
This gives a single data object with data from all the CSVs together. In case you need them separately, you can omit the bind_rows()
step, giving you a list of multiple tables ("tibbles"). These can then be split using list2env()
or some split()
function.
How to import csv file with column names as identifier, not file name in R
Not sure if there is an out-of-the-box solution in R.
Here is one way where you read the column names of all the files in the folder and return the complete file where all the column names passed are matched.
return_correct_file <- function(path, col_names) {
file_list <- list.files(path, full.names = TRUE)
file_index <- which(sapply(file_list, function(x)
all(col_names %in% names(read.csv2(x, nrows = 0)))))[1]
return(read.csv2(file_list[file_index]))
}
You can call this function as :
data <- return_correct_file(path = 'path/to/csv/files',
col_names = c("interesting1", "interesting2", "interesting3"))
Add Id column and populate with CSV name in R forloop
Updated
library(readr)
library(dplyr)
infolder <- "C:\\Users\\***\\r"
outfolder <- "C:\\Users\\***\\out"
setwd(infolder)
csvfiles <- dir(path = infolder, pattern = "\\.csv$")
for (i in csvfiles) {
#tmpfile <- read_csv(i)
#tmpfile$filename <- i
#write_csv(tmpfile, file.path(outfolder, i)
print(file.path(outfolder, i))
}
How to import a CSV with a last empty column into R?
The real problem is that empty column doesn't have a header. If they had only had the extra comma at the end of the header line this probably wouldn't be as messy. But you can also do a bit of column shuffling with fill=TRUE
. For example
dd <- read.table("~/../Downloads/jcr ecology 2020.csv", sep=",",
skip=2, fill=T, header=T, row.names=NULL)
names(dd)[-ncol(dd)] <- names(dd)[-1]
dd <- dd[,-ncol(dd)]
This reads in the data but puts the rows names in the data.frame and fills the last column with NA. Then you shift all the column names over to the left and drop the last column.
Get Column Name of CSV file
reading_data[0,]
doesn't return you the column names, it returns you a dataframe with no rows selected.
Check for example with mtcars
mtcars[1, ]
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
This is 1st row of mtcars
with column names.
Now if you do mtcars[0, ]
mtcars[0, ]
# [1] mpg cyl disp hp drat wt qsec vs am gear carb
#<0 rows> (or 0-length row.names)
It returns column names as it is with no rows selected as there is no row at index 0.
If you want to apply some functions on each column name separately you can do
for(i in names(reading_data)){
print(i)
#add the operation to be applied here
}
names(mtcars)
or colnames(mtcars)
would give you the column names directly.
names(mtcars)
# [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
colnames(mtcars)
# [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
Related Topics
Reading Multiple Files and Calculating Mean Based on User Input
Grouped Barplot in R with Error Bars
Use Merge() to Update a Data Frame with Values from a Second Data Frame
How to Make Tibbles Display Significant Digits
Sum Cells of Certain Columns for Each Row
Error ".Onload Failed in Loadnamespace() for 'Tcltk'"
Mean of Each Element of a List of Matrices
Pasting Elements of Two Vectors Alphabetically
What Are the R Sorting Rules of Character Vectors
Return Index from a Vector of the Value Closest to a Given Element
Remove Backslashes from Character String
How to Geocode a Simple Address Using Data Science Toolbox
Set Locale to System Default Utf-8
Fill in Missing Values by Group in Data.Table
How to Draw a Line Across a Multiple-Figure Environment in R