How to read data from a zip file without having to unzip the entire file
DotNetZip is your friend here.
As easy as:
using (ZipFile zip = ZipFile.Read(ExistingZipFile))
{
ZipEntry e = zip["MyReport.doc"];
e.Extract(OutputStream);
}
(you can also extract to a file or other destinations).
Reading the zip file's table of contents is as easy as:
using (ZipFile zip = ZipFile.Read(ExistingZipFile))
{
foreach (ZipEntry e in zip)
{
if (header)
{
System.Console.WriteLine("Zipfile: {0}", zip.Name);
if ((zip.Comment != null) && (zip.Comment != ""))
System.Console.WriteLine("Comment: {0}", zip.Comment);
System.Console.WriteLine("\n{1,-22} {2,8} {3,5} {4,8} {5,3} {0}",
"Filename", "Modified", "Size", "Ratio", "Packed", "pw?");
System.Console.WriteLine(new System.String('-', 72));
header = false;
}
System.Console.WriteLine("{1,-22} {2,8} {3,5:F0}% {4,8} {5,3} {0}",
e.FileName,
e.LastModified.ToString("yyyy-MM-dd HH:mm:ss"),
e.UncompressedSize,
e.CompressionRatio,
e.CompressedSize,
(e.UsesEncryption) ? "Y" : "N");
}
}
Edited To Note: DotNetZip used to live at Codeplex. Codeplex has been shut down. The old archive is still available at Codeplex. It looks like the code has migrated to Github:
- https://github.com/DinoChiesa/DotNetZip. Looks to be the original author's repo.
- https://github.com/haf/DotNetZip.Semverd. This looks to be the currently maintained version. It's also packaged up an available via Nuget at https://www.nuget.org/packages/DotNetZip/
Reading contents of zip file without extracting
You actually are reading what exactly is in the file.
The /r/n character is the newline character in windows. The question
Difference between \n and \r? goes into a bit more detail, but what it comes down to is that Windows uses /r/n as its newline.
The b' character you seeing is related to python and how it parses the file. The question What does the 'b' character do in front of a string literal? does a good job answering why exactly that is happening, but the documentation quoted is:
Bytes literals are always prefixed with 'b' or 'B'; they produce an
instance of the bytes type instead of the str type. They may only
contain ASCII characters; bytes with a numeric value of 128 or greater
must be expressed with escapes.
EDIT: I actually found a very similar answer you can pull from for reading without the extra characters: py3k: How do you read a file inside a zip file as text, not bytes?. The basic idea was you could use this:
items_file = io.TextIOWrapper(items_file, encoding='your-encoding', newline='')
Compare ZIP file with dir with shell command
You can install a command line tool called unzip
, and run
$unzip -l yourzipfile.zip
Files contained in yourzipfile.zip
will be listed.
========
To verify files automatically, you can follow these steps.
If files compressed into yourzipfile.zip
is in dir1
, you can first unzip yourzipfile.zip
into dir2
, then you may compare files in dir1
and dir2
by running
$ diff --brief -r dir1/ dir2/
Why does Zipping the same content twice gives two files with different SHA1?
According to Wikipedia http://en.wikipedia.org/wiki/Zip_(file_format) seems that zip files have headers for
File last modification time and File last modification date so any zip file checked into git will appear to git to have changed if the zip is rebuilt from the same content since. And it seems that there is no flag to tell it to not set those headers.
I am resorting to just using tar, it seems to produce the same bytes for the same input if run multiple times.
Related Topics
Reliably Kill Sleep Process After Usr1 Signal
How to Set a Color Profile with Exiftool
Conditional Awk Hashmap Match Lookup
Trouble Ssh Tunneling to Remote Server
How to Log from a Non-Root Debian Linux Daemon
Diff Files Comparing Only First N Characters of Each Line
Open-Source Opengl Profiler for Linux
How to Check If Linux Console Screensaver Has Blanked Screen
How to Pass Input to a Running Service or Daemon
What Do Getresuid() and Setresuid() Do
How Linux Scheduler Schedules Processes on Multi-Core Processors
Qxcbconnection: Could Not Connect to Display Aborted, When Installing Qt on Linux
Why Do I Get The Information of "Suspended (Tty Input)" When I Run My Script in The Background
Why Are Both "True" and "False" Tests True
Linux Command 'Ll' Is Not Working
Cross-Compilation to X86_64-Unknown-Linux-Gnu Fails on MAC Osx