unzip specific extension only
Something along the lines of:
#!/bin/bash
cd ~/basedir/files
for file in *.zip ; do
newfile=$(echo "${file}" | sed -e 's/^files.//' -e 's/.zip$//')
echo ":${newfile}:"
mkdir tmp
rm -rf "${newfile}"
mkdir "${newfile}"
cp "${newfile}.zip" tmp
cd tmp
unzip "${newfile}.zip"
find . -name '*.jpg' -exec cp {} "../${newfile}" ';'
find . -name '*.gif' -exec cp {} "../${newfile}" ';'
cd ..
rm -rf tmp
done
This is tested and will handle spaces in filenames (both the zip files and the extracted files). You may have collisions if the zip file has the same file name in different directories (you can't avoid this if you're going to flatten the directory structure).
Extract only files with specific extension via bash
You can specify a pattern:
for zipfiles in /downloads/*.zip; do unzip "$zipfiles" '*.docx'; done
Tested to work with UnZip 6.00
.
You can also specify the -x
option to exclude.
Unzip a file with a particular extension (not .zip)
The problem is that a .ZIP file is much more than simply deflated data. There's directory structures, checksums, file metadata, etc. etc.
You need to use a class that knows about this structure. Unless the file is using some of the more advanced stuff, such as encryption and spanning archives, the .NET ZipArchive class probably does the trick.
Here's a simple program that extracts the contents of a text file from the zip archive. You must adapt it to your needs:
using (var file = File.Open(@"D:\Temp\Temp.zip", FileMode.Open))
using (var archive = new ZipArchive(file))
{
var entry = archive.GetEntry("ttt/README.md");
using (var entryStream = entry.Open())
using (var memory = new MemoryStream())
{
entryStream.CopyTo(memory);
Console.WriteLine(Encoding.UTF8.GetString(memory.ToArray()));
}
}
Extracting all the files of a selected extension from a zipped file
you should be able to do something like this
import zipfile
def main():
archive = 'archive.zip'
directory = './'
extensions = ('.txt', '.pdf')
zip_file = zipfile.ZipFile(archive, 'r')
[zip_file.extract(file, directory) for file in zip_file.namelist() if file.endswith(extensions)]
zip_file.close()
if __name__ == '__main__':
main()
bash - Unzip specific type of files from multiple zip
You need to protect your wildcard pattern with single quotes, otherwise the shell will expand it (this is called globbing):
for file in *.zip; do
unzip ${file} '*.txt'
done
During the first iteration of the loop, *.txt
doesn't expand to anything as there are no txt files in the working directory, so the command works as expected. After the first iteration, it expands to all the txt files you just extracted from the first zip file, so the second and subsequent iterations actually look something like this after globbing:
unzip ${file} file1.txt
Extract specific file extensions from multiple 7-zip files
This solution is based on bash, grep and awk, it works on Cygwin and on Ubuntu.
Since you have the requirement to search for (X) [!].ext
files first and if there are no such files then look for (X).ext
files, I don't think it is possible to write some single expression to handle this logic.
The solution should have some if/else conditional logic to test the list of files inside the archive and decide which files to extract.
Here is the initial structure inside the zip/rar archive I tested my script on (I made a script to prepare this structure):
folder
├── 7z_1.7z
│ ├── (E).txt
│ ├── (J) [!].txt
│ ├── (J).txt
│ ├── (U) [!].txt
│ └── (U).txt
├── 7z_2.7z
│ ├── (J) [b1].txt
│ ├── (J) [b2].txt
│ ├── (J) [o1].txt
│ └── (J).txt
├── 7z_3.7z
│ ├── (E) [!].txt
│ ├── (J).txt
│ └── (U).txt
└── 7z 4.7z
└── test.txt
The output is this:
output
├── 7z_1.7z # This is a folder, not an archive
│ ├── (J) [!].txt # Here we extracted only files with [!]
│ └── (U) [!].txt
├── 7z_2.7z
│ └── (J).txt # Here there are no [!] files, so we extracted (J)
├── 7z_3.7z
│ └── (E) [!].txt # We had here both [!] and (J), extracted only file with [!]
└── 7z 4.7z
└── test.txt # We had only one file here, extracted it
And this is the script to do the extraction:
#!/bin/bash
# Remove the output (if it's left from previous runs).
rm -r output
mkdir -p output
# Unzip the zip archive.
unzip data.zip -d output
# For rar use
# unrar x data.rar output
# OR
# 7z x -ooutput data.rar
for archive in output/folder/*.7z
do
# See https://stackoverflow.com/questions/7148604
# Get the list of file names, remove the extra output of "7z l"
list=$(7z l "$archive" | awk '
/----/ {p = ++p % 2; next}
$NF == "Name" {pos = index($0,"Name")}
p {print substr($0,pos)}
')
# Get the list of files with [!].
extract_list=$(echo "$list" | grep "[!]")
if [[ -z $extract_list ]]; then
# If we don't have files with [!], then look for ([A-Z]) pattern
# to get files with single letter in brackets.
extract_list=$(echo "$list" | grep "([A-Z])\.")
fi
if [[ -z $extract_list ]]; then
# If we only have one file - extract it.
if [[ ${#list[@]} -eq 1 ]]; then
extract_list=$list
fi
fi
if [[ ! -z $extract_list ]]; then
# If we have files to extract, then do the extraction.
# Output path is output/7zip_archive_name/
out_path=output/$(basename "$archive")
mkdir -p "$out_path"
echo "$extract_list" | xargs -I {} 7z x -o"$out_path" "$archive" {}
fi
done
The basic idea here is to go over 7zip archives and get the list of files for each of them using 7z l
command (list of files).
The output of the command if quite verbose, so we use awk
to clean it up and get the list of file names.
After that we filter this list using grep
to get either a list of [!]
files or a list of (X)
files.
Then we just pass this list to 7zip to extract the files we need.
Extracting files with specific extensions from a lot of ZIP archives using Python
Simply pass the zipfile object to the extractor as param. You shouldn't try to parse the filepath out of string representation of the list - that is most likely what causes problem. Try something like:
import zipfile
import os
import fnmatch
def archive1():
rootPath= (r'E:\Test\2017')
pattern = '*.zip'
for root, dirs, files in os.walk(rootPath):
for filename in fnmatch.filter(files,pattern):
with zipfile.ZipFile(os.path.join(root, filename)) as zf:
extractor(zf)
def extractor(zip_file):
new_dr = r'E:\Test'
extensions = ('.txt','.pdf')
[zip_file.extract(file,new_dr) for file in zip_file.namelist() if file.endswith(extensions)]
if __name__ == '__main__':
archive1()
How to traverse subfolders in a zip file and unzip files with specific extension?
You can do that with a recursive procedure that calls itself for folder items and extracts file items if they have a specific extension:
Set fso = CreateObject("Scripting.FileSystemObject")
Set app = CreateObject("Shell.Application")
Sub ExtractByExtension(fldr, ext, dst)
For Each f In fldr.Items
If f.Type = "File folder" Then
ExtractByExtension f.GetFolder, ext, dst
ElseIf LCase(fso.GetExtensionName(f.Name)) = LCase(ext) Then
app.NameSpace(dst).CopyHere f.Path
End If
Next
End Sub
ExtractByExtension app.NameSpace("C:\path\to\your.zip"), "txt", "C:\output"
Related Topics
Application Counters in Linux? (And Osx)
How to Remember Multiple Tabs' Session in Terminal? (Alike Ff Session Manager)
Linux Transfer Parameter for Function in Declare_Work
Is Number of Frame = Number of Pages(Linux)
Linux: Send Mail After a Process Id Finishes or Is Killed
Bash Script That Creates a Directory Structure
Cannot Change The Maximum Open Files Per Process with Sysctl
How to Delete/Remove Certificates from Mono Certificate Stores My and Trust
Setting Process Name (As Seen by 'Ps') in Go
What Is The Maximum Allowed Depth of Sub-Folders
In Linux, How to Find Find Directory with The Most Subdirectories or Files
Checking Shared Libraries for Non Default Loaders
Powershell's Equivalent to Linux's: Ls -Al
Embedded Linux - Mechanism for Deploying Firmware Updates
Why Processes Are Deprived of CPU for Too Long While Busy Looping in Linux Kernel
Can't Access Publicly Exposed Docker Container Port from External Machine, Only from Localhost
Command to Check Status of Message Queue and Shared Memory in Linux
Is Wget or Similar Programs Always Available on Posix Systems