PDF Compare on Linux Command Line

PDF compare on linux command line

Done in 2 lines with (the allmighty) imagemagick and pdftk:

compare -verbose -debug coder $PDF_1 $PDF_2 -compose src $OUT_FILE.tmp
pdftk $OUT_FILE.tmp background $PDF_1 output $OUT_FILE

The options -verbose and -debug are optional.

  • compare creates a PDF with the diff as red pixels.
  • pdftk merges the diff-pdf with background PDF_1

How to compare two pdf files through command line

How about i-net PDFC - it does a full content comparison - text, images, lines, header/footer-detection and so on. You can use it either on command line or with a GUI (2.0, currently in public beta-phase).

The command-line tool already has the option to compare folders with PDFs against each other (or the extreme way: use the API ;))

Disclaimer: Yep, I work for the company who made this - so feedback highly appreciated.

Saving the output from DiffPDF / ComparePDF command line. - Comparing folders of PDF's

You could have a look at these answers to similar questions:

  • PDF compare on linux command line
  • How to compare two pdf files through command line
  • How to unit test a Python function that draws PDF graphics?

However, I have no idea if any of these would be performing faster than what your automated Acrobat Pro comparison does... Let me know if you found out, will you?

Shortcut:

For simplicity, let's assume your input files to be compared are similar enough, and each being only 1 page. (For multi-page input expand the base idea of this answer...)

The two most essential commands any such comparison boils down to are these:

compare.exe ^
%input1% ^
%input2% ^
-compose src ^
%output%.tmp.pdf

and

pdftk.exe ^
%output%.tmp.pdf ^
background %input1% ^
output %output%.pdf
  • The first command generates a PDF with all differential pixels colored in red. (A default resolution is used here, 72 dpi. For a more fine-grained view on pixel differences add -density 200 (that will mean: 200 dpi) or higher -- but your processing time will increase accordingly as will the disk space needed by the output...)
  • The second command tries to merge the resulting PDF with a background taken from ${input1}.

Optionally, you may add -verbose -debug coder after the compare command for a better idea about what's going on.

compare.exe is a commandline tool from the great, great ImageMagick family of utilities (available for Linux, Windows, Unix and MacOSX). But it requires a Ghostscript installation to use as a 'delegate' in order to be able to process PDF input. pdftk.exe is also a commandline utility, available for the same platforms. Both a Free Software.

After the first command, you'll have an output file which has only red pixels where there are differences found on the page.

After the second command, you'll have an output with all red 'diff' pixels in the context of the first input PDF.

Example output:

Here are screenshots of two 1-page PDF files with differences in their content:

Example PDF file 1
Example PDF file 2


Here are screenshots of the output produced by the two commands above:

  • The left one shows the intermediate result (after first command), with only the difference pixels displaying as red (identical pixels being white).
  • The screenshot on the right shows the red difference pixels, but this time with the input PDF file number 1 as a (gray) background (after second command).

Red difference pixels only; identical pixels are white
Red difference pixels with PDF file 1 as background context


(PDF input files courtesy of Mark Summerfield, author of the beautiful DiffPDF tool.)

Tool to compare large numbers of PDF files?

Because there is no such tool available that we have written one. You can download the i-net PDF content comparer and use it. I hope that help other with the same problem. If you have problems with it or you have feedback for us then you can contact our support.

Sample Image

Merge / convert multiple PDF files into one PDF

I'm sorry, I managed to find the answer myself using google and a bit of luck : )

For those interested;

I installed the pdftk (pdf toolkit) on our debian server, and using the following command I achieved desired output:

pdftk file1.pdf file2.pdf cat output output.pdf

OR

gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf file1.pdf file2.pdf file3.pdf ...

This in turn can be piped directly into pdf2ps.

How to compare binary files to check if they are the same?

The standard unix diff will show if the files are the same or not:

[me@host ~]$ diff 1.bin 2.bin
Binary files 1.bin and 2.bin differ

If there is no output from the command, it means that the files have no differences.



Related Topics



Leave a reply



Submit