Importing multiple .csv files with variable column types into R
The lapply
should be the form lapply(x, FUN, ...)
where ...
is the arguments passed to FUN
. You're filling the arguments within FUN. It should be lapply(files, read_csv, col_types = cols(.default = "c"))
If you like a tidyverse
solution:
files %>%
map_df(~read_csv(.x, col_types = cols(.default = "c")))
Which will bind the whole thing into a data frame at the end.
How to import multiple .csv files at once?
Something like the following should result in each data frame as a separate element in a single list:
temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)
This assumes that you have those CSVs in a single directory--your current working directory--and that all of them have the lower-case extension .csv
.
If you then want to combine those data frames into a single data frame, see the solutions in other answers using things like do.call(rbind,...)
, dplyr::bind_rows()
or data.table::rbindlist()
.
If you really want each data frame in a separate object, even though that's often inadvisable, you could do the following with assign
:
temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))
Or, without assign
, and to demonstrate (1) how the file name can be cleaned up and (2) show how to use list2env
, you can try the following:
temp = list.files(pattern="*.csv")
list2env(
lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))),
read.csv), envir = .GlobalEnv)
But again, it's often better to leave them in a single list.
How to import multiple .csv files from folder into R and select columns?
You may try this approach -
#column names
cols <- c('col1', 'col5', 'col6', ...)
#Or column numbers
#cols <- c(1, 5, 6, ...)
library(dplyr)
library(purrr)
all_files <- list.files('/csv/folder', pattern = '\\.csv$', full.names = TRUE)
result <- map_df(all_files,
~.x %>% readr::read_csv() %>% select(cols), .id = 'filenum')
result
In result
, I have also created an additional column called filenum
which will indicate the file number from where the row is originating.
Importing multiple csv files and add year to each file
There are various ways to do this but without changing much of your code, you can add an id variable named year
in map_df
which will have index of the filename. So first file (ACS_09_5YR_B19301_with_ann.csv
) would have index as 1, second file (ACS_10_5YR_B19301_with_ann.csv
) would have index as 2 and so on.
You can then add 2008 to this index to get year value from 2009-2017.
list.files(path = "./ed_attainment/",
pattern = "\\.csv",
full.names = TRUE) %>%
purrr::map_df(~readr::read_csv(.,col_types = cols(.default = "c")),
.id ='year') %>%
dplyr::mutate(year = 2008 + as.integer(year))
Importing and pivoting multiple CSV files and embedding as variable in data table
We can make a toy example of the myfiles
data structure like this:
df_maker <- function(x){
data.frame(wavenumber = 2^(6:10), absorbance = round(runif(5), 3))
}
set.seed(69)
myfiles <- lapply(1:3, df_maker)
So we have a list of two-column data frames containing matching values of wavenumber
but different values for absorbance
as described in the question:
myfiles
#> [[1]]
#> wavenumber absorbance
#> 1 64 0.531
#> 2 128 0.769
#> 3 256 0.646
#> 4 512 0.865
#> 5 1024 0.369
#>
#> [[2]]
#> wavenumber absorbance
#> 1 64 0.869
#> 2 128 0.171
#> 3 256 0.788
#> 4 512 0.174
#> 5 1024 0.022
#>
#> [[3]]
#> wavenumber absorbance
#> 1 64 0.883
#> 2 128 0.357
#> 3 256 0.926
#> 4 512 0.260
#> 5 1024 0.183
The idea is that we want to transform this structure into a data frame where the columns are the wavelengths, with one row for each file. We can do this by using lapply
to pick out the absorbance
vectors and rbind
them together into a matrix. We then name the columns of the matrix according to the wavelength
column of the first file. Finally, we convert to a data frame, adding a file_number
column so we can keep track of where each observation came from:
values <- do.call(rbind, lapply(myfiles, function(x) x$absorbance))
values <- `colnames<-`(values, paste0("lambda_", myfiles[[1]]$wavenumber))
df <- data.frame(file_number = seq(nrow(values)), values)
So the final result looks like this:
df
#> file_number lambda_64 lambda_128 lambda_256 lambda_512 lambda_1024
#> 1 1 0.531 0.769 0.646 0.865 0.369
#> 2 2 0.869 0.171 0.788 0.174 0.022
#> 3 3 0.883 0.357 0.926 0.260 0.183
Created on 2020-07-05 by the reprex package (v0.3.0)
Related Topics
How Is Ggplot2 Plus Operator Defined
How to Know a Dimension of Matrix or Vector in R
Conditionally Remove Leading or Trailing '.' Character in R
R: Need Finite 'Ylim' Values in Function
Calculate Centroid Within/Inside a Spatialpolygon
Sum Columns by Group (Row Names) in a Matrix
R Shiny: Plot with Dynamical Size
How to Obtain All Combinations of the Columns of a Data Frame Taken by 2
R: in Barplot Midpoints Are Not Centered W.R.T. Bars
Using Tidy Eval for Multiple Dplyr Filter Conditions
Change Values in Row Based on a Column Value R
Shiny Leaflet Easyprint Plugin
Error When Mapping in Ggmap with API Key (403 Forbidden)
Interleave Columns of Two Data Frames
Bar Plot for Count Data by Group in R
Reshape R Data with User Entries in Rows, Collapsing for Each User
Calculate Row Means Based on (Partial) Matching Column Names