Using R to download zipped data file, extract, and import data
Zip archives are actually more a 'filesystem' with content metadata etc. See help(unzip)
for details. So to do what you sketch out above you need to
- Create a temp. file name (eg
tempfile()
) - Use
download.file()
to fetch the file into the temp. file - Use
unz()
to extract the target file from temp. file - Remove the temp file via
unlink()
which in code (thanks for basic example, but this is simpler) looks like
temp <- tempfile()
download.file("http://www.newcl.org/data/zipfiles/a1.zip",temp)
data <- read.table(unz(temp, "a1.dat"))
unlink(temp)
Compressed (.z
) or gzipped (.gz
) or bzip2ed (.bz2
) files are just the file and those you can read directly from a connection. So get the data provider to use that instead :)
Using R to download zipped data file, extract, and import .csv
In order to get your data to download and uncompress, you need to set mode="wb"
download.file("...",temp, mode="wb")
unzip(temp, "gbr_Country_en_csv_v2.csv")
dd <- read.table("gbr_Country_en_csv_v2.csv", sep=",",skip=2, header=T)
It looks like the default is "w" which assumes a text files. If it was a plain csv file this would be fine. But since it's compressed, it's a binary file, hence the "wb". Without the "wb" part, you can't open the zip at all.
R Reading in a zip data file without unzipping it
If your zip file is called Sales.zip
and contains only a file called Sales.dat
, I think you can simply do the following (assuming the file is in your working directory):
data <- read.table(unz("Sales.zip", "Sales.dat"), nrows=10, header=T, quote="\"", sep=",")
Download a large zipped CSV file, unzip and read into R on Linux
It might be a file permission issue. To get around it work in a directory you're already in, or know you have access to.
# DOWNLOAD THE FILE
# to a directory you can access, and name the file. No need to overcomplicate this.
download.file("https://download.cms.gov/nppes/NPPES_Data_Dissemination_February_2022.zip",
destfile = "/home/myName/myProjectname/npi.csv")
# use the decompress function if you need to, though unzip might work
x <- decompress_file(directory = "/home/myName/myProjectname/",
file = "npi.zip")
# remove .zip file if you need the space back
file.remove("/home/myName/myProjectname/npi.zip")
Using R to download gzipped data file, extract, and import data
I like Ramnath's approach, but I would use temp files like so:
tmpdir <- tempdir()
url <- 'http://archive.ics.uci.edu/ml/databases/tic/tic.tar.gz'
file <- basename(url)
download.file(url, file)
untar(file, compressed = 'gzip', exdir = tmpdir )
list.files(tmpdir)
The list.files()
should produce something like this:
[1] "TicDataDescr.txt" "dictionary.txt" "ticdata2000.txt" "ticeval2000.txt" "tictgts2000.txt"
which you could parse if you needed to automate this process for a lot of files.
Related Topics
Convert Accented Characters into Ascii Character
Passing Parameters to R Markdown
How to Format Data for Plotly Sunburst Diagram
How to Find Common Rows Between Two Dataframe in R
Using R to Download Newest Files from Ftp-Server
Add Row in Each Group Using Dplyr and Add_Row()
How to Change Angle of Line in Customized Legend in Ggplot2
Print to PDF File Using Grid.Table in R - Too Many Rows to Fit on One Page
Any Way to Force Fread() of Data.Table Not to Stop on Empty Lines
Is There an Error in Round Function in R
Annotate Values Above Bars (Ggplot Faceted)
Read Multiple Xlsx Files with Multiple Sheets into One R Data Frame
Dplyr Group by Colnames Described as Vector of Strings
Extract First Word from a Column and Insert into New Column
Dplyr Summarize with Subtotals