Unzip Specific Extension Only

unzip specific extension only

Something along the lines of:

#!/bin/bash
cd ~/basedir/files
for file in *.zip ; do
newfile=$(echo "${file}" | sed -e 's/^files.//' -e 's/.zip$//')
echo ":${newfile}:"
mkdir tmp
rm -rf "${newfile}"
mkdir "${newfile}"
cp "${newfile}.zip" tmp
cd tmp
unzip "${newfile}.zip"
find . -name '*.jpg' -exec cp {} "../${newfile}" ';'
find . -name '*.gif' -exec cp {} "../${newfile}" ';'
cd ..
rm -rf tmp
done

This is tested and will handle spaces in filenames (both the zip files and the extracted files). You may have collisions if the zip file has the same file name in different directories (you can't avoid this if you're going to flatten the directory structure).

Extract only files with specific extension via bash

You can specify a pattern:

for zipfiles in /downloads/*.zip; do unzip "$zipfiles" '*.docx'; done

Tested to work with UnZip 6.00.

You can also specify the -x option to exclude.

Unzip a file with a particular extension (not .zip)

The problem is that a .ZIP file is much more than simply deflated data. There's directory structures, checksums, file metadata, etc. etc.

You need to use a class that knows about this structure. Unless the file is using some of the more advanced stuff, such as encryption and spanning archives, the .NET ZipArchive class probably does the trick.

Here's a simple program that extracts the contents of a text file from the zip archive. You must adapt it to your needs:

using (var file = File.Open(@"D:\Temp\Temp.zip", FileMode.Open))
using (var archive = new ZipArchive(file))
{
var entry = archive.GetEntry("ttt/README.md");
using (var entryStream = entry.Open())
using (var memory = new MemoryStream())
{
entryStream.CopyTo(memory);
Console.WriteLine(Encoding.UTF8.GetString(memory.ToArray()));
}
}

Extracting all the files of a selected extension from a zipped file

you should be able to do something like this

import zipfile

def main():
archive = 'archive.zip'
directory = './'
extensions = ('.txt', '.pdf')
zip_file = zipfile.ZipFile(archive, 'r')
[zip_file.extract(file, directory) for file in zip_file.namelist() if file.endswith(extensions)]
zip_file.close()

if __name__ == '__main__':
main()

bash - Unzip specific type of files from multiple zip

You need to protect your wildcard pattern with single quotes, otherwise the shell will expand it (this is called globbing):

for file in *.zip; do
unzip ${file} '*.txt'
done

During the first iteration of the loop, *.txt doesn't expand to anything as there are no txt files in the working directory, so the command works as expected. After the first iteration, it expands to all the txt files you just extracted from the first zip file, so the second and subsequent iterations actually look something like this after globbing:

unzip ${file} file1.txt

Extract specific file extensions from multiple 7-zip files

This solution is based on bash, grep and awk, it works on Cygwin and on Ubuntu.

Since you have the requirement to search for (X) [!].ext files first and if there are no such files then look for (X).ext files, I don't think it is possible to write some single expression to handle this logic.

The solution should have some if/else conditional logic to test the list of files inside the archive and decide which files to extract.

Here is the initial structure inside the zip/rar archive I tested my script on (I made a script to prepare this structure):

folder
├── 7z_1.7z
│   ├── (E).txt
│   ├── (J) [!].txt
│   ├── (J).txt
│   ├── (U) [!].txt
│   └── (U).txt
├── 7z_2.7z
│   ├── (J) [b1].txt
│   ├── (J) [b2].txt
│   ├── (J) [o1].txt
│   └── (J).txt
├── 7z_3.7z
│ ├── (E) [!].txt
│ ├── (J).txt
│ └── (U).txt
└── 7z 4.7z
└── test.txt

The output is this:

output
├── 7z_1.7z # This is a folder, not an archive
│   ├── (J) [!].txt # Here we extracted only files with [!]
│   └── (U) [!].txt
├── 7z_2.7z
│   └── (J).txt # Here there are no [!] files, so we extracted (J)
├── 7z_3.7z
│   └── (E) [!].txt # We had here both [!] and (J), extracted only file with [!]
└── 7z 4.7z
└── test.txt # We had only one file here, extracted it

And this is the script to do the extraction:

#!/bin/bash

# Remove the output (if it's left from previous runs).
rm -r output
mkdir -p output

# Unzip the zip archive.
unzip data.zip -d output
# For rar use
# unrar x data.rar output
# OR
# 7z x -ooutput data.rar

for archive in output/folder/*.7z
do
# See https://stackoverflow.com/questions/7148604
# Get the list of file names, remove the extra output of "7z l"
list=$(7z l "$archive" | awk '
/----/ {p = ++p % 2; next}
$NF == "Name" {pos = index($0,"Name")}
p {print substr($0,pos)}
')
# Get the list of files with [!].
extract_list=$(echo "$list" | grep "[!]")
if [[ -z $extract_list ]]; then
# If we don't have files with [!], then look for ([A-Z]) pattern
# to get files with single letter in brackets.
extract_list=$(echo "$list" | grep "([A-Z])\.")
fi
if [[ -z $extract_list ]]; then
# If we only have one file - extract it.
if [[ ${#list[@]} -eq 1 ]]; then
extract_list=$list
fi
fi
if [[ ! -z $extract_list ]]; then
# If we have files to extract, then do the extraction.
# Output path is output/7zip_archive_name/
out_path=output/$(basename "$archive")
mkdir -p "$out_path"
echo "$extract_list" | xargs -I {} 7z x -o"$out_path" "$archive" {}
fi
done

The basic idea here is to go over 7zip archives and get the list of files for each of them using 7z l command (list of files).

The output of the command if quite verbose, so we use awk to clean it up and get the list of file names.

After that we filter this list using grep to get either a list of [!] files or a list of (X) files.
Then we just pass this list to 7zip to extract the files we need.

Extracting files with specific extensions from a lot of ZIP archives using Python

Simply pass the zipfile object to the extractor as param. You shouldn't try to parse the filepath out of string representation of the list - that is most likely what causes problem. Try something like:

import zipfile
import os
import fnmatch

def archive1():
rootPath= (r'E:\Test\2017')
pattern = '*.zip'
for root, dirs, files in os.walk(rootPath):
for filename in fnmatch.filter(files,pattern):
with zipfile.ZipFile(os.path.join(root, filename)) as zf:
extractor(zf)

def extractor(zip_file):
new_dr = r'E:\Test'
extensions = ('.txt','.pdf')
[zip_file.extract(file,new_dr) for file in zip_file.namelist() if file.endswith(extensions)]

if __name__ == '__main__':
archive1()

How to traverse subfolders in a zip file and unzip files with specific extension?

You can do that with a recursive procedure that calls itself for folder items and extracts file items if they have a specific extension:

Set fso = CreateObject("Scripting.FileSystemObject")
Set app = CreateObject("Shell.Application")

Sub ExtractByExtension(fldr, ext, dst)
For Each f In fldr.Items
If f.Type = "File folder" Then
ExtractByExtension f.GetFolder, ext, dst
ElseIf LCase(fso.GetExtensionName(f.Name)) = LCase(ext) Then
app.NameSpace(dst).CopyHere f.Path
End If
Next
End Sub

ExtractByExtension app.NameSpace("C:\path\to\your.zip"), "txt", "C:\output"


Related Topics



Leave a reply



Submit