When Importing CSV into R How to Generate Column with Name of the CSV

When importing CSV into R how to generate column with name of the CSV?

You have already done all the hard work. With a fairly small modification this should be straight-forward.

The logic is:

  1. Create a small helper function that reads an individual csv and adds a column with the file name.
  2. Call this helper function in llply()

The following should work:

read_csv_filename <- function(filename){
ret <- read.csv(filename)
ret$Source <- filename #EDIT
ret
}

import.list <- ldply(filenames, read_csv_filename)

Note that I have proposed another small improvement to your code: read.csv() returns a data.frame - this means you can use ldply() rather than llply().

Extracting column name from CSV filename during import

This would create a long table with date as 4th column -

library(tidyverse)

data <- list.files(path = "...", pattern = "*.csv", full.names = TRUE) %>%
sapply(read_csv, simplify = FALSE) %>%
imap_dfr(~.x %>%
mutate(date = sub('.*(\\d{4}-\\d{2}-\\d{2}).*', '\\1', basename(.y))))

sapply with simplify = FALSE would create a list with names of the list as file name. Using imap_dfr we combine all the data in one dataframe and create a new column date extract the date from the list name.

Read in CSV files and Add a Column with File name

Here's a (mostly) tidyverse alternative that avoids looping:

library(tidyverse)

csv_names <- list.files(path = "path/", # set the path to your folder with csv files
pattern = "*.csv", # select all csv files in the folder
full.names = T) # output full file names (with path)
# csv_names <- c("file_1_october.csv", "file_2_november.csv")

csv_names2 <- data.frame(month = csv_names,
id = as.character(1:length(csv_names))) # id for joining

data <- csv_names %>%
lapply(read_csv) %>% # read all the files at once
bind_rows(.id = "id") %>% # bind all tables into one object, and give id for each
left_join(csv_names2) # join month column created earlier

This gives a single data object with data from all the CSVs together. In case you need them separately, you can omit the bind_rows() step, giving you a list of multiple tables ("tibbles"). These can then be split using list2env() or some split() function.

How to import csv file with column names as identifier, not file name in R

Not sure if there is an out-of-the-box solution in R.

Here is one way where you read the column names of all the files in the folder and return the complete file where all the column names passed are matched.

return_correct_file <- function(path, col_names) {

file_list <- list.files(path, full.names = TRUE)
file_index <- which(sapply(file_list, function(x)
all(col_names %in% names(read.csv2(x, nrows = 0)))))[1]
return(read.csv2(file_list[file_index]))
}

You can call this function as :

data <- return_correct_file(path = 'path/to/csv/files', 
col_names = c("interesting1", "interesting2", "interesting3"))

Add Id column and populate with CSV name in R forloop

Updated

library(readr)
library(dplyr)

infolder <- "C:\\Users\\***\\r"
outfolder <- "C:\\Users\\***\\out"

setwd(infolder)

csvfiles <- dir(path = infolder, pattern = "\\.csv$")

for (i in csvfiles) {
#tmpfile <- read_csv(i)
#tmpfile$filename <- i
#write_csv(tmpfile, file.path(outfolder, i)
print(file.path(outfolder, i))
}

How to import a CSV with a last empty column into R?

The real problem is that empty column doesn't have a header. If they had only had the extra comma at the end of the header line this probably wouldn't be as messy. But you can also do a bit of column shuffling with fill=TRUE. For example

dd <- read.table("~/../Downloads/jcr ecology 2020.csv", sep=",", 
skip=2, fill=T, header=T, row.names=NULL)
names(dd)[-ncol(dd)] <- names(dd)[-1]
dd <- dd[,-ncol(dd)]

This reads in the data but puts the rows names in the data.frame and fills the last column with NA. Then you shift all the column names over to the left and drop the last column.

Get Column Name of CSV file

reading_data[0,] doesn't return you the column names, it returns you a dataframe with no rows selected.

Check for example with mtcars

mtcars[1, ]
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4

This is 1st row of mtcars with column names.

Now if you do mtcars[0, ]

mtcars[0, ]
# [1] mpg cyl disp hp drat wt qsec vs am gear carb
#<0 rows> (or 0-length row.names)

It returns column names as it is with no rows selected as there is no row at index 0.

If you want to apply some functions on each column name separately you can do

for(i in names(reading_data)){ 
print(i)
#add the operation to be applied here
}

names(mtcars) or colnames(mtcars) would give you the column names directly.

names(mtcars)
# [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
colnames(mtcars)
# [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"


Related Topics



Leave a reply



Submit