How to Compare Two Tarball's Content

How to compare two tarball's content

tarsum is almost what you need. Take its output, run it through sort to get the ordering identical on each, and then compare the two with diff. That should get you a basic implementation going, and it would be easily enough to pull those steps into the main program by modifying the Python code to do the whole job.

How to compare two tar files in terms of the packaging ,signing , obfuscation mechanism i?

When a jar is signed, two additional files are included in the jar's META-INF directory. See the documentation for jarsigner. You can learn whether a jar is signed, and what file was used to sign it, by checking the content of the META-INF directory.

When a jar is sealed, there will be additional content in the META-INF/MANIFEST.MF file. See the tutorial on package sealing. You can check how a jar is sealed by checking the content of that file.

Regarding obfuscation, I have no idea.

You could write a script which does something like this:

extract the tar
find all jars
for each jar
# signing info
list META-INF/*.SF and META-INF/*.DSA
# sealing info
search META-INF/MANIFEST.MF for line pairs matching "Name: xxx<newline>Sealed: true"

Write your output to a file.
Compare the output of your script for the two different tar files.

(But really, you probably need to spend more time to understand your build files.)

How does git know if a tarball has changed?

Git knows if a tar file has changed the same way it detects if other files have changed: it compares the contents of the file. This may be as naïve as comparing them byte by byte or by computing a hash of the file first and then comparing the hashes. Since Git internally stores all known files with their hash, this can be used instead of doing the expensive byte-by-byte comparison.

To make use of the functionality, you could simply use Git itself to compare any two files on your filee system:

git diff --no-index file1.tgz file2.tgz

Or, if you don't have Git available, you could use the plain diff command instead.

Another option would be to manually compute checksums of the two files and compare the checksums instead. If the checksums are different, then the files are guaranteed to be different. If the checksums are identical, it is very likely that the file contents are also identical, but there's still the probability of hash collisions, so to be certain, you'd then have to compare the files byte-by-byte.

A simple way to compute and compare checksums of two files would be the following:

test "$(sha1sum <file1)" = "$(sha1sum <file2)"

Note the IO redirect, so that the output is the same even if the files have different file names.

You can of course use any other hashing algorithm such as sha256sum

Why do the md5 hashes of two tarballs of the same file differ?

tar czf outfile infiles is equivalent to

tar cf - infiles | gzip > outfile

The reason the files are different is because gzip puts its input filename and modification time into the compressed file. When the input is a pipe, it uses an empty string as the filename and the current time as the modification time.

But it also has a --no-name option, which tells it not to put the name and timestamp into the file. So if you write the expanded command explicitly, instead of using the -z option to tar, you can make use of this option.

tar cf - testfile | gzip --no-name > a.tar.gz
tar cf - testfile | gzip --no-name > b.tar.gz

I tested this on OS X 10.6.8 and it works.

Compare npm pack tarball with what's on NPM

You can compare two Zip-balls with the open-source tool comp_zip available here.

How to overwrite a file in a tarball

No way as the first file have already been written when you ask to write the second one and the stream has advanced the position. Remember tar files are sequentially accessed.

You should do deduplication before starting to write.



Related Topics



Leave a reply



Submit