R Reading in a zip data file without unzipping it
If your zip file is called Sales.zip
and contains only a file called Sales.dat
, I think you can simply do the following (assuming the file is in your working directory):
data <- read.table(unz("Sales.zip", "Sales.dat"), nrows=10, header=T, quote="\"", sep=",")
R Reading in a zip data file without unzipping it (loss of information)
You can also always use fread()
from data.table
. You can execute arbitrary shell commands from the file argument to handle the unzip, and it won't auto coerce your timestamps by default either, so you shouldn't have the truncation issue. The vignette Convenience features of fread has some great examples.
(Bonus, it's significantly faster than reader
, and absolutely blows it out of the water if you install the development v1.10.5 version off github with multi-threading in fread
.\
library(data.table)
myData <- fread("gunzip -c foo.txt.gz")
When reading in data from a zip-file in R, it corrupts the previous read-in data
The problem is the default behavior of the read_delim()
function. In order to improve performance the data is loaded in a lazy manner, meaning the data is only accessed when needed.
So in actuality the return value from "f_get_data" is just a pointer to the data. In this case it is a pointer your temporary file which is overwritten on each call to the function.
To solve this, set lazy to FALSE in the read_delim()
function call.
df <- read_delim(unzip(zip_file, files = data), delim = ",", lazy=FALSE) %>%
mutate(year = i + 2015)
Reading a zip file in R without knowing the csv file name within it
Why don't you try using unzip
to find the filename inside the ZIP archive:
zipdf <- unzip(zip_file, list = TRUE)
# the following line assuming the archive has only a single file
csv_file <- zipdf$Name[0]
your_df <- read.table(csv_file, skip = 10, nrows=10, header=T, quote="\"", sep=",")
Using R to download zipped data file, extract, and import data
Zip archives are actually more a 'filesystem' with content metadata etc. See help(unzip)
for details. So to do what you sketch out above you need to
- Create a temp. file name (eg
tempfile()
) - Use
download.file()
to fetch the file into the temp. file - Use
unz()
to extract the target file from temp. file - Remove the temp file via
unlink()
which in code (thanks for basic example, but this is simpler) looks like
temp <- tempfile()
download.file("http://www.newcl.org/data/zipfiles/a1.zip",temp)
data <- read.table(unz(temp, "a1.dat"))
unlink(temp)
Compressed (.z
) or gzipped (.gz
) or bzip2ed (.bz2
) files are just the file and those you can read directly from a connection. So get the data provider to use that instead :)
Related Topics
How to Play Birthday Music Using R
Replacing All Missing Values in R Data.Table with a Value
Change Color of Leaflet Marker
Namespace Dependencies Not Required
Enter New Column Names as String in Dplyr's Rename Function
How to Remove Empty Data Frames from a List
Ggmap with Geom_Map Superimposed
Using R to "Click" a Download File Button on a Webpage
Knitr: Getting a Parse_All Error in R When Converting Rmd File into HTML
Plot Random Effects from Lmer (Lme4 Package) Using Qqmath or Dotplot: How to Make It Look Fancy
How to Extract Fitted Splines from a Gam ('Mgcv::Gam')
Logistic Regression - Defining Reference Level in R
How to Use the Row.Names Attribute to Order the Rows of My Dataframe in R
Function for Retrieving Own Ip Address from Within R
Applying R Script Prepared for Single File to Multiple Files in the Directory
How to Expand an Ellipsis (...) Argument Without Evaluating It in R