Importing Multiple .CSV Files with Variable Column Types into R

Importing multiple .csv files with variable column types into R

The lapply should be the form lapply(x, FUN, ...) where ... is the arguments passed to FUN. You're filling the arguments within FUN. It should be lapply(files, read_csv, col_types = cols(.default = "c"))

If you like a tidyverse solution:

files %>%
map_df(~read_csv(.x, col_types = cols(.default = "c")))

Which will bind the whole thing into a data frame at the end.

How to import multiple .csv files at once?

Something like the following should result in each data frame as a separate element in a single list:

temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)

This assumes that you have those CSVs in a single directory--your current working directory--and that all of them have the lower-case extension .csv.

If you then want to combine those data frames into a single data frame, see the solutions in other answers using things like do.call(rbind,...), dplyr::bind_rows() or data.table::rbindlist().

If you really want each data frame in a separate object, even though that's often inadvisable, you could do the following with assign:

temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))

Or, without assign, and to demonstrate (1) how the file name can be cleaned up and (2) show how to use list2env, you can try the following:

temp = list.files(pattern="*.csv")
list2env(
lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))),
read.csv), envir = .GlobalEnv)

But again, it's often better to leave them in a single list.

How to import multiple .csv files from folder into R and select columns?

You may try this approach -

#column names
cols <- c('col1', 'col5', 'col6', ...)
#Or column numbers
#cols <- c(1, 5, 6, ...)

library(dplyr)
library(purrr)

all_files <- list.files('/csv/folder', pattern = '\\.csv$', full.names = TRUE)
result <- map_df(all_files,
~.x %>% readr::read_csv() %>% select(cols), .id = 'filenum')

result

In result, I have also created an additional column called filenum which will indicate the file number from where the row is originating.

Importing multiple csv files and add year to each file

There are various ways to do this but without changing much of your code, you can add an id variable named year in map_df which will have index of the filename. So first file (ACS_09_5YR_B19301_with_ann.csv) would have index as 1, second file (ACS_10_5YR_B19301_with_ann.csv) would have index as 2 and so on.

You can then add 2008 to this index to get year value from 2009-2017.

list.files(path = "./ed_attainment/",
pattern = "\\.csv",
full.names = TRUE) %>%
purrr::map_df(~readr::read_csv(.,col_types = cols(.default = "c")),
.id ='year') %>%
dplyr::mutate(year = 2008 + as.integer(year))

Importing and pivoting multiple CSV files and embedding as variable in data table

We can make a toy example of the myfiles data structure like this:

df_maker <- function(x){
data.frame(wavenumber = 2^(6:10), absorbance = round(runif(5), 3))
}

set.seed(69)
myfiles <- lapply(1:3, df_maker)

So we have a list of two-column data frames containing matching values of wavenumber but different values for absorbance as described in the question:

myfiles
#> [[1]]
#> wavenumber absorbance
#> 1 64 0.531
#> 2 128 0.769
#> 3 256 0.646
#> 4 512 0.865
#> 5 1024 0.369
#>
#> [[2]]
#> wavenumber absorbance
#> 1 64 0.869
#> 2 128 0.171
#> 3 256 0.788
#> 4 512 0.174
#> 5 1024 0.022
#>
#> [[3]]
#> wavenumber absorbance
#> 1 64 0.883
#> 2 128 0.357
#> 3 256 0.926
#> 4 512 0.260
#> 5 1024 0.183

The idea is that we want to transform this structure into a data frame where the columns are the wavelengths, with one row for each file. We can do this by using lapply to pick out the absorbance vectors and rbind them together into a matrix. We then name the columns of the matrix according to the wavelength column of the first file. Finally, we convert to a data frame, adding a file_number column so we can keep track of where each observation came from:

values <- do.call(rbind, lapply(myfiles, function(x) x$absorbance))
values <- `colnames<-`(values, paste0("lambda_", myfiles[[1]]$wavenumber))
df <- data.frame(file_number = seq(nrow(values)), values)

So the final result looks like this:

df
#> file_number lambda_64 lambda_128 lambda_256 lambda_512 lambda_1024
#> 1 1 0.531 0.769 0.646 0.865 0.369
#> 2 2 0.869 0.171 0.788 0.174 0.022
#> 3 3 0.883 0.357 0.926 0.260 0.183

Created on 2020-07-05 by the reprex package (v0.3.0)



Related Topics



Leave a reply



Submit