Using R to Download Zipped Data File, Extract, and Import .Csv

Using R to download zipped data file, extract, and import data

Zip archives are actually more a 'filesystem' with content metadata etc. See help(unzip) for details. So to do what you sketch out above you need to

  1. Create a temp. file name (eg tempfile())
  2. Use download.file() to fetch the file into the temp. file
  3. Use unz() to extract the target file from temp. file
  4. Remove the temp file via unlink()

which in code (thanks for basic example, but this is simpler) looks like

temp <- tempfile()
data <- read.table(unz(temp, "a1.dat"))

Compressed (.z) or gzipped (.gz) or bzip2ed (.bz2) files are just the file and those you can read directly from a connection. So get the data provider to use that instead :)

Using R to download zipped data file, extract, and import .csv

In order to get your data to download and uncompress, you need to set mode="wb"

download.file("...",temp, mode="wb")
unzip(temp, "gbr_Country_en_csv_v2.csv")
dd <- read.table("gbr_Country_en_csv_v2.csv", sep=",",skip=2, header=T)

It looks like the default is "w" which assumes a text files. If it was a plain csv file this would be fine. But since it's compressed, it's a binary file, hence the "wb". Without the "wb" part, you can't open the zip at all.

R Reading in a zip data file without unzipping it

If your zip file is called and contains only a file called Sales.dat, I think you can simply do the following (assuming the file is in your working directory):

data <- read.table(unz("", "Sales.dat"), nrows=10, header=T, quote="\"", sep=",")

Download a large zipped CSV file, unzip and read into R on Linux

It might be a file permission issue. To get around it work in a directory you're already in, or know you have access to.

# to a directory you can access, and name the file. No need to overcomplicate this.

destfile = "/home/myName/myProjectname/npi.csv")

# use the decompress function if you need to, though unzip might work
x <- decompress_file(directory = "/home/myName/myProjectname/",
file = "")

# remove .zip file if you need the space back

Using R to download gzipped data file, extract, and import data

I like Ramnath's approach, but I would use temp files like so:

tmpdir <- tempdir()

url <- ''
file <- basename(url)
download.file(url, file)

untar(file, compressed = 'gzip', exdir = tmpdir )

The list.files() should produce something like this:

[1] "TicDataDescr.txt" "dictionary.txt"   "ticdata2000.txt"  "ticeval2000.txt"  "tictgts2000.txt" 

which you could parse if you needed to automate this process for a lot of files.

Related Topics

Leave a reply
