Using 'Fread' to Import CSV File from an Archive into 'R' Without Extracting to Disk

R: data.table fread on zip containing multiple files

Having unzip installed, this solution works on my computer :

fread(cmd = 'unzip -p df.zip df1.csv')
V1 mpg cyl disp hp drat wt qsec vs am gear carb
1: Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2: Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4: Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5: Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6: Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Using R to download zipped data file, extract, and import data

Zip archives are actually more a 'filesystem' with content metadata etc. See help(unzip) for details. So to do what you sketch out above you need to

  1. Create a temp. file name (eg tempfile())
  2. Use download.file() to fetch the file into the temp. file
  3. Use unz() to extract the target file from temp. file
  4. Remove the temp file via unlink()

which in code (thanks for basic example, but this is simpler) looks like

temp <- tempfile()
download.file("http://www.newcl.org/data/zipfiles/a1.zip",temp)
data <- read.table(unz(temp, "a1.dat"))
unlink(temp)

Compressed (.z) or gzipped (.gz) or bzip2ed (.bz2) files are just the file and those you can read directly from a connection. So get the data provider to use that instead :)

R Reading in a zip data file without unzipping it

If your zip file is called Sales.zip and contains only a file called Sales.dat, I think you can simply do the following (assuming the file is in your working directory):

data <- read.table(unz("Sales.zip", "Sales.dat"), nrows=10, header=T, quote="\"", sep=",")

Looping through files using dynamic name variable in R

you can use the list.files command here:

first set your working directory, where all your files are stored there:

setwd("C:/Users/...")

then

file.names = list.files(pattern = "*.zip", recursive = F)

then your for loop will be:

for (i in 1:length(file.names)) {
#open the files

zipFile <- file.names[i]
dataFile <- sub(".zip", ".csv", zipFile)

Temp_Data <- read.table(unz(zipFile,
dataFile), sep = ",")
# your function for the opened file
Master_Data <- rbind(Master_Data, Temp_Data)

#write the file finaly
write_delim(x = Master_Data, path = paste(file.names[[i]]), delim = "\t",
col_names = T )}

How to import multiple .csv files at once?

Something like the following should result in each data frame as a separate element in a single list:

temp = list.files(pattern="*.csv")
myfiles = lapply(temp, read.delim)

This assumes that you have those CSVs in a single directory--your current working directory--and that all of them have the lower-case extension .csv.

If you then want to combine those data frames into a single data frame, see the solutions in other answers using things like do.call(rbind,...), dplyr::bind_rows() or data.table::rbindlist().

If you really want each data frame in a separate object, even though that's often inadvisable, you could do the following with assign:

temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i]))

Or, without assign, and to demonstrate (1) how the file name can be cleaned up and (2) show how to use list2env, you can try the following:

temp = list.files(pattern="*.csv")
list2env(
lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))),
read.csv), envir = .GlobalEnv)

But again, it's often better to leave them in a single list.



Related Topics



Leave a reply



Submit