How to Programmatically Extract/Unzip a .7Z (7-Zip) File with R

How to programmatically extract / unzip a .7z (7-zip) file with R

If you have 7z executable in your path, you can simple use system command

system('7z e -o <output_dir> <archive_name>')

Opening a .7z file in R

The archive package will open 7zip format.

You will need to install the devtools package to install it.

devtools::install_github("jimhester/archive")

I'm unable to access your example file on the FTP server. Assuming that it is a multi-file archive of .txt files, you would access it like this:

a <- archive("AC2008.7z")

Assuming it contained a file named x.txt with columns delimited by white space, you might do something like:

library(readr)
x <- read_table(archive_read(a, "x.txt"))

how do I extract 7-zip zip file without directory

Try this instead:

7z e -oD:\Data\ODS_Source D:\Data\DATA_DROP\Source.zip

How do I unzip all files in a folder using 7-zip in batch?

This will unzip all zip files in the current folder(into the same folder), assuming you have installed 7zip into C:\Program Files\7-Zip location.

If you have added your 7zip folder into the path, you can just enter 7z instead of the fullpath

"C:\Program Files\7-Zip\7z.exe" e *.zip

Unzip password protected zip files in R

I found this question very useful but saw that no formal answers were posted, so here goes:

  1. First I installed 7z.
  2. Then I added "C:\Program Files\7-Zip" to my environment path.
  3. I tested that the 7z command was recognized from the command line.
  4. I opened R and typed in system("7z x secure.7z -pPASSWORD") with the appropriate PASSWORD.

I have multiple zipped files and I'd rather not the password show in the source code or be stored in any text file, so I wrote the following script:

file_list <- list.files(path = ".", pattern = ".7z", all.files = T)
pw <- readline(prompt = "Enter the password: ")
for (file in file_list) {
sys_command <- paste0("7z ", "x ", file, " -p", pw)
system(sys_command)
}

which when sourced will prompt me to enter the password, and the zip files will be decompressed in a loop.

Sys.glob () within unzip ()

Sys.glob expands files that already exist. So the parameter to your unzip call will depend on what files are in your working directory.

Perhaps you want to do unzip with list=TRUE to return the list of files in the zip first, and then use some pattern matching to select the files you want.

See ?grep for info on matching strings with patterns. These patterns are "regular expressions" rather than "glob" expansions, but you should be able to work with that.

Here's a concrete example:

# whats in the zip?
files = unzip("c.zip", list=TRUE)$Name
files
[1] "l_spatial.dbf" "l_spatial.shp" "l_spatial.shx" "ls_polys_bin.dbf"
[5] "ls_polys_bin.shp" "ls_polys_bin.shx" "rast_jan90.tif"

# what files have "dbf" in them:
files[grepl("dbf",files)]
[1] "l_spatial.dbf" "ls_polys_bin.dbf"

# extract just those:
unzip("c.zip", files=files[grepl("dbf",files)])

The regular expression for your glob

 "[a-z][a-z][a-z][-][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][.][x][m][l]"

would be

 "^[a-z]{3}-[0-9]{8}\\.xml$"

that's a match of start of string ("^"), 3 a-z (lower case only), a dash, eight digits, a dot (backslashes are needed, one because dot means "any one char" in regexps and another because R needs a backslash to escape a backslash), "xml", and the end of the string ("$").



Related Topics



Leave a reply



Submit