How to programmatically extract / unzip a .7z (7-zip) file with R
If you have 7z
executable in your path, you can simple use system
command
system('7z e -o <output_dir> <archive_name>')
Opening a .7z file in R
The archive package will open 7zip format.
You will need to install the devtools
package to install it.
devtools::install_github("jimhester/archive")
I'm unable to access your example file on the FTP server. Assuming that it is a multi-file archive of .txt files, you would access it like this:
a <- archive("AC2008.7z")
Assuming it contained a file named x.txt
with columns delimited by white space, you might do something like:
library(readr)
x <- read_table(archive_read(a, "x.txt"))
how do I extract 7-zip zip file without directory
Try this instead:
7z e -oD:\Data\ODS_Source D:\Data\DATA_DROP\Source.zip
How do I unzip all files in a folder using 7-zip in batch?
This will unzip all zip files in the current folder(into the same folder), assuming you have installed 7zip into C:\Program Files\7-Zip
location.
If you have added your 7zip folder into the path, you can just enter 7z instead of the fullpath
"C:\Program Files\7-Zip\7z.exe" e *.zip
Unzip password protected zip files in R
I found this question very useful but saw that no formal answers were posted, so here goes:
- First I installed 7z.
- Then I added "C:\Program Files\7-Zip" to my environment path.
- I tested that the
7z
command was recognized from the command line. - I opened R and typed in
system("7z x secure.7z -pPASSWORD")
with the appropriatePASSWORD
.
I have multiple zipped files and I'd rather not the password show in the source code or be stored in any text file, so I wrote the following script:
file_list <- list.files(path = ".", pattern = ".7z", all.files = T)
pw <- readline(prompt = "Enter the password: ")
for (file in file_list) {
sys_command <- paste0("7z ", "x ", file, " -p", pw)
system(sys_command)
}
which when sourced will prompt me to enter the password, and the zip files will be decompressed in a loop.
Sys.glob () within unzip ()
Sys.glob
expands files that already exist. So the parameter to your unzip
call will depend on what files are in your working directory.
Perhaps you want to do unzip
with list=TRUE
to return the list of files in the zip first, and then use some pattern matching to select the files you want.
See ?grep
for info on matching strings with patterns. These patterns are "regular expressions" rather than "glob" expansions, but you should be able to work with that.
Here's a concrete example:
# whats in the zip?
files = unzip("c.zip", list=TRUE)$Name
files
[1] "l_spatial.dbf" "l_spatial.shp" "l_spatial.shx" "ls_polys_bin.dbf"
[5] "ls_polys_bin.shp" "ls_polys_bin.shx" "rast_jan90.tif"
# what files have "dbf" in them:
files[grepl("dbf",files)]
[1] "l_spatial.dbf" "ls_polys_bin.dbf"
# extract just those:
unzip("c.zip", files=files[grepl("dbf",files)])
The regular expression for your glob
"[a-z][a-z][a-z][-][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][.][x][m][l]"
would be
"^[a-z]{3}-[0-9]{8}\\.xml$"
that's a match of start of string ("^"), 3 a-z (lower case only), a dash, eight digits, a dot (backslashes are needed, one because dot means "any one char" in regexps and another because R needs a backslash to escape a backslash), "xml", and the end of the string ("$").
Related Topics
Exporting Non-S3-Methods with Dots in the Name Using Roxygen2 V4
How to Set the Default Language of Date in R
Is There a Weighted.Median() Function
How to Convert Integer into Categorical Data in R
Join R Data.Tables Where Key Values Are Not Exactly Equal--Combine Rows with Closest Times
Select Row with Most Recent Date by Group
Add a Box for the Na Values to the Ggplot Legend for a Continuous Map
Modify X-Axis Labels in Each Facet
Perform Multiple Paired T-Tests Based on Groups/Categories
How to Pivot/Unpivot (Cast/Melt) Data Frame
Lapply-Ing with the "$" Function
Ggplot Geom_Text Font Size Control
How to 'Source()' and Continue After an Error
Can Rbind Be Parallelized in R