PDF Compression with Itextsharp

Does iTextSharp Handle PDF Compression?

Yes, iText and iTextSharp support compression.

  • From PDF 1.0 (1993) to PDF 1.1 (1994), PDF syntax stored in content streams wasn't compressed.
  • From PDF 1.2 (1996) on, PDF syntax stored in content streams could be compressed. The standard filter is /FlateDecode. This algorithm is similar to the ZIP algorithm and you can set different levels of compression (from 0 to 9; where choosing -1 will use whatever your programming language considers being the default).
  • From PDF 1.5 (2003) on, the indirect objects can be stored in a compressed object stream. Additionally, the cross-reference table can be compressed and stored in a stream. Before PDF 1.5, this wasn't possible (viewers that only support PDF 1.4 and earlier can't open "fully compressed" PDFs).

iText supports all of the above and Chris' answer already fully answers your question. Since PDF 1.1 dates from a really long time ago (1994), I wouldn't worry about changing the compression levels of content streams, so you can safely forget about:

reader.SetPageContent(1, reader.GetPageContent(1), PdfStream.BEST_COMPRESSION);

Using this line won't reduce the file size much.

Using "full compression" (which will cause the cross-reference table to be compressed) should have an effect on the file size for PDFs with many indirect objects. A minimal "Hello World" file could increase in file size when you use "full compression".

All of the above won't help you much, because good PDF creators already compress whatever can be compressed. Bad PDF creators however (or people using good PDF creators incorrectly) could contain objects that are redundant. For instance: there are people who don't know how to add a logo as an image to each page in a PDF using iTextSharp. Because of their ignorance, they add the image as many times as there are pages. PDF compression won't help you in this case, but if you pass such a "bad" PDF through iTextSharp's PdfSmartCopy, then PdfSmartCopy will detect the redundant objects and reorganize the file so that objects that are repeated over and over again in the file (for instance: every page refers to a different object with the same image bytes), are reused (for instance: every page refers to the same object with the image bytes).

Depending on the version of iTextSharp you're using reader.RemoveUnusedObjects(); will also help you (recent versions remove unused objects by default).

Can we compress PDF file size using iText?

You can try to set a compression level when using iText:

Document document = ...
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
writer.setCompressionLevel(9);

Level 9 is slowest, but gives you the best compression available in iText.


Please note, that the compression effect largely depends on the PDF content. If your PDF file contains large binary streams, such as images, the compression will have little to no effect on your document. Also, iText will never compress XMP metadata stream regardless of the configuration options.

Best practice to compress generated PDF using iText7

If you are just adding scans to a pdf document, it makes sense for the size of the resulting document to go up if you're using a high resolution image.

Keep in mind that iText is a pdf library. Not an image-manipulation library.

You could of course use regular old java to attempt to compress the images.

public static void writeJPG(BufferedImage bufferedImage, OutputStream outputStream, float quality) throws IOException
{
Iterator<ImageWriter> iterator = ImageIO.getImageWritersByFormatName("jpg");
ImageWriter imageWriter = iterator.next();
ImageWriteParam imageWriteParam = imageWriter.getDefaultWriteParam();
imageWriteParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
imageWriteParam.setCompressionQuality(quality);
ImageOutputStream imageOutputStream = new MemoryCacheImageOutputStream(outputStream);
imageWriter.setOutput(imageOutputStream);
IIOImage iioimage = new IIOImage(bufferedImage, null, null);
imageWriter.write(null, iioimage, imageWriteParam);
imageOutputStream.flush();
}

But really, putting scanned images into a pdf makes life so much more difficult. Imagine the people who have to handle that document after you. They open it, see text, try to select it, and nothing happens.

Additionaly, you might change the WriterProperties when creating your PdfWriter instance:

PdfWriter writer = new PdfWriter(dest,
new WriterProperties().setFullCompressionMode(true));

Full compression mode will compress certain objects into an object stream, and it will also compress the cross-reference table of the PDF. Since most of the objects in your document will be images (which are already compressed), compressing objects won't have much effect, but if you have a large number of pages, compressing the cross-reference table may result in smaller PDF files.



Related Topics



Leave a reply



Submit