How to Merge Multiple PDF Files (Generated in Run Time)

How to merge multiple pdf files (generated in run time)?

If you want to merge source documents using iText(Sharp), there are two basic situations:

  1. You really want to merge the documents, acquiring the pages in their original format, transfering as much of their content and their interactive annotations as possible. In this case you should use a solution based on a member of the Pdf*Copy* family of classes.

  2. You actually want to integrate pages from the source documents into a new document but want the new document to govern the general format and don't care for the interactive features (annotations...) in the original documents (or even want to get rid of them). In this case you should use a solution based on the PdfWriter class.

You can find details in chapter 6 (especially section 6.4) of iText in Action — 2nd Edition. The Java sample code can be accessed here and the C#'ified versions here.

A simple sample using PdfCopy is Concatenate.java / Concatenate.cs. The central piece of code is:

byte[] mergedPdf = null;
using (MemoryStream ms = new MemoryStream())
{
using (Document document = new Document())
{
using (PdfCopy copy = new PdfCopy(document, ms))
{
document.Open();

for (int i = 0; i < pdf.Count; ++i)
{
PdfReader reader = new PdfReader(pdf[i]);
// loop over the pages in that document
int n = reader.NumberOfPages;
for (int page = 0; page < n; )
{
copy.AddPage(copy.GetImportedPage(reader, ++page));
}
}
}
}
mergedPdf = ms.ToArray();
}

Here pdf can either be defined as a List<byte[]> immediately containing the source documents (appropriate for your use case of merging intermediate in-memory documents) or as a List<String> containing the names of source document files (appropriate if you merge documents from disk).

An overview at the end of the referenced chapter summarizes the usage of the classes mentioned:

  • PdfCopy: Copies pages from one or more existing PDF documents. Major downsides: PdfCopy doesn’t detect redundant content, and it fails when concatenating forms.

  • PdfCopyFields: Puts the fields of the different forms into one form. Can be used to avoid the problems encountered with form fields when concatenating forms using PdfCopy. Memory use can be an issue.

  • PdfSmartCopy: Copies pages from one or more existing PDF documents. PdfSmartCopy is able to detect redundant content, but it needs more memory and CPU than PdfCopy.

  • PdfWriter: Generates PDF documents from scratch. Can import pages from other PDF documents. The major downside is that all interactive features of the imported page (annotations, bookmarks, fields, and so forth) are lost in the process.

Merge / convert multiple PDF files into one PDF

I'm sorry, I managed to find the answer myself using google and a bit of luck : )

For those interested;

I installed the pdftk (pdf toolkit) on our debian server, and using the following command I achieved desired output:

pdftk file1.pdf file2.pdf cat output output.pdf

OR

gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf file1.pdf file2.pdf file3.pdf ...

This in turn can be piped directly into pdf2ps.

Merge PDF files

Use Pypdf or its successor PyPDF2:

A Pure-Python library built as a PDF toolkit. It is capable of:

  • splitting documents page by page,
  • merging documents page by page,

(and much more)

Here's a sample program that works with both versions.

#!/usr/bin/env python
import sys
try:
from PyPDF2 import PdfFileReader, PdfFileWriter
except ImportError:
from pyPdf import PdfFileReader, PdfFileWriter

def pdf_cat(input_files, output_stream):
input_streams = []
try:
# First open all the files, then produce the output file, and
# finally close the input files. This is necessary because
# the data isn't read from the input files until the write
# operation. Thanks to
# https://stackoverflow.com/questions/6773631/problem-with-closing-python-pypdf-writing-getting-a-valueerror-i-o-operation/6773733#6773733
for input_file in input_files:
input_streams.append(open(input_file, 'rb'))
writer = PdfFileWriter()
for reader in map(PdfFileReader, input_streams):
for n in range(reader.getNumPages()):
writer.addPage(reader.getPage(n))
writer.write(output_stream)
finally:
for f in input_streams:
f.close()
output_stream.close()

if __name__ == '__main__':
if sys.platform == "win32":
import os, msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
pdf_cat(sys.argv[1:], sys.stdout)

Combine two (or more) PDF's

I had to solve a similar problem and what I ended up doing was creating a small pdfmerge utility that uses the PDFSharp project which is essentially MIT licensed.

The code is dead simple, I needed a cmdline utility so I have more code dedicated to parsing the arguments than I do for the PDF merging:

using (PdfDocument one = PdfReader.Open("file1.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument two = PdfReader.Open("file2.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument outPdf = new PdfDocument())
{
CopyPages(one, outPdf);
CopyPages(two, outPdf);

outPdf.Save("file1and2.pdf");
}

void CopyPages(PdfDocument from, PdfDocument to)
{
for (int i = 0; i < from.PageCount; i++)
{
to.AddPage(from.Pages[i]);
}
}

Merging multiple PDFs using iTextSharp in c#.net

I found the answer:

Instead of the 2nd Method, add more files to the first array of input files.

public static void CombineMultiplePDFs(string[] fileNames, string outFile)
{
// step 1: creation of a document-object
Document document = new Document();
//create newFileStream object which will be disposed at the end
using (FileStream newFileStream = new FileStream(outFile, FileMode.Create))
{
// step 2: we create a writer that listens to the document
PdfCopy writer = new PdfCopy(document, newFileStream);

// step 3: we open the document
document.Open();

foreach (string fileName in fileNames)
{
// we create a reader for a certain document
PdfReader reader = new PdfReader(fileName);
reader.ConsolidateNamedDestinations();

// step 4: we add content
for (int i = 1; i <= reader.NumberOfPages; i++)
{
PdfImportedPage page = writer.GetImportedPage(reader, i);
writer.AddPage(page);
}

PRAcroForm form = reader.AcroForm;
if (form != null)
{
writer.CopyAcroForm(reader);
}

reader.Close();
}

// step 5: we close the document and writer
writer.Close();
document.Close();
}//disposes the newFileStream object
}

Is there a faster way to merge two files rather than page by page?

I do not have enough 'reputation' to comment. But since I was going to post an answer I made it long.

Normally when people want to 'merge' documents they mean 'combining' them, or as you point out, concatenate or append one pdf at the end of the other (or somewhere in between). But based on the code you present, it seems you meant overlaying one pdf over another, right? Or in other words, you want page 1 from both pdf1 and pdf2 to be combined in to a single page as part of a new pdf.

If so, you could use this (modified from example used to illustrate watermarking). It is still overlaying one page at a time. But, pdfrw is known to be super fast compared to PyPDF2 and supposed to work well with reportlab. I havent compared the speeds, so not sure if this will actually be faster than what you already have

from pdfrw import PdfReader, PdfWriter, PageMerge

p1 = pdfrw.PdfReader("file1")
p2 = pdfrw.PdfReader("file2")

for page in range(len(p1.pages)):
merger = PageMerge(p1.pages[page])
merger.add(p2.pages[page]).render()

writer = PdfWriter()
writer.write("output.pdf", p1)

How to merge two PDF files into one in Java?

Why not use the PDFMergerUtility of pdfbox?

PDFMergerUtility ut = new PDFMergerUtility();
ut.addSource(...);
ut.addSource(...);
ut.addSource(...);
ut.setDestinationFileName(...);
ut.mergeDocuments();


Related Topics



Leave a reply



Submit