PDF compare on linux command line
Done in 2 lines with (the allmighty) imagemagick and pdftk:
compare -verbose -debug coder $PDF_1 $PDF_2 -compose src $OUT_FILE.tmp
pdftk $OUT_FILE.tmp background $PDF_1 output $OUT_FILE
The options -verbose and -debug are optional.
- compare creates a PDF with the diff as red pixels.
- pdftk merges the diff-pdf with background PDF_1
How to compare two pdf files through command line
How about i-net PDFC - it does a full content comparison - text, images, lines, header/footer-detection and so on. You can use it either on command line or with a GUI (2.0, currently in public beta-phase).
The command-line tool already has the option to compare folders with PDFs against each other (or the extreme way: use the API ;))
Disclaimer: Yep, I work for the company who made this - so feedback highly appreciated.
Saving the output from DiffPDF / ComparePDF command line. - Comparing folders of PDF's
You could have a look at these answers to similar questions:
- PDF compare on linux command line
- How to compare two pdf files through command line
- How to unit test a Python function that draws PDF graphics?
However, I have no idea if any of these would be performing faster than what your automated Acrobat Pro comparison does... Let me know if you found out, will you?
Shortcut:
For simplicity, let's assume your input files to be compared are similar enough, and each being only 1 page. (For multi-page input expand the base idea of this answer...)
The two most essential commands any such comparison boils down to are these:
compare.exe ^
%input1% ^
%input2% ^
-compose src ^
%output%.tmp.pdf
and
pdftk.exe ^
%output%.tmp.pdf ^
background %input1% ^
output %output%.pdf
- The first command generates a PDF with all differential pixels colored in red. (A default resolution is used here, 72 dpi. For a more fine-grained view on pixel differences add
-density 200
(that will mean: 200 dpi) or higher -- but your processing time will increase accordingly as will the disk space needed by the output...) - The second command tries to merge the resulting PDF with a background taken from ${input1}.
Optionally, you may add -verbose -debug coder
after the compare
command for a better idea about what's going on.
compare.exe
is a commandline tool from the great, great ImageMagick family of utilities (available for Linux, Windows, Unix and MacOSX). But it requires a Ghostscript installation to use as a 'delegate' in order to be able to process PDF input. pdftk.exe
is also a commandline utility, available for the same platforms. Both a Free Software.
After the first command, you'll have an output file which has only red pixels where there are differences found on the page.
After the second command, you'll have an output with all red 'diff' pixels in the context of the first input PDF.
Example output:
Here are screenshots of two 1-page PDF files with differences in their content:
Here are screenshots of the output produced by the two commands above:
- The left one shows the intermediate result (after first command), with only the difference pixels displaying as red (identical pixels being white).
- The screenshot on the right shows the red difference pixels, but this time with the input PDF file number 1 as a (gray) background (after second command).
(PDF input files courtesy of Mark Summerfield, author of the beautiful DiffPDF tool.)
Tool to compare large numbers of PDF files?
Because there is no such tool available that we have written one. You can download the i-net PDF content comparer and use it. I hope that help other with the same problem. If you have problems with it or you have feedback for us then you can contact our support.
Merge / convert multiple PDF files into one PDF
I'm sorry, I managed to find the answer myself using google and a bit of luck : )
For those interested;
I installed the pdftk (pdf toolkit) on our debian server, and using the following command I achieved desired output:
pdftk file1.pdf file2.pdf cat output output.pdf
OR
gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf file1.pdf file2.pdf file3.pdf ...
This in turn can be piped directly into pdf2ps.
How to compare binary files to check if they are the same?
The standard unix diff
will show if the files are the same or not:
[me@host ~]$ diff 1.bin 2.bin
Binary files 1.bin and 2.bin differ
If there is no output from the command, it means that the files have no differences.
Related Topics
Trying to Ping Linux Vm Hosted on Azure Does Not Work
How to Force Cp to Overwrite Without Confirmation
How to Get Sudo Access for a File Inside the Vi Text Editor
How to Make Grep Print the Lines Below and Above Each Matching Line
How to Grep a String in a Directory and All Its Subdirectories
Signals and Interrupts a Comparison
Starting a Shell in the Docker Alpine Container
How to Append Contents of Multiple Files into One File
Determining the Path That a Yum Package Installed To
How to Configure a Systemd Service to Restart Periodically
How to Put the Current Running Linux Process in Background
How to Run Multiple Tor Processes at Once with Different Exit Ips
How to Set Up Curl to Permanently Use a Proxy
Amazon Linux: "Apt-Get: Command Not Found"