How to merge multiple pdf files (generated in run time)?
If you want to merge source documents using iText(Sharp), there are two basic situations:
You really want to merge the documents, acquiring the pages in their original format, transfering as much of their content and their interactive annotations as possible. In this case you should use a solution based on a member of the
Pdf*Copy*
family of classes.You actually want to integrate pages from the source documents into a new document but want the new document to govern the general format and don't care for the interactive features (annotations...) in the original documents (or even want to get rid of them). In this case you should use a solution based on the
PdfWriter
class.
You can find details in chapter 6 (especially section 6.4) of iText in Action — 2nd Edition. The Java sample code can be accessed here and the C#'ified versions here.
A simple sample using PdfCopy
is Concatenate.java / Concatenate.cs. The central piece of code is:
byte[] mergedPdf = null;
using (MemoryStream ms = new MemoryStream())
{
using (Document document = new Document())
{
using (PdfCopy copy = new PdfCopy(document, ms))
{
document.Open();
for (int i = 0; i < pdf.Count; ++i)
{
PdfReader reader = new PdfReader(pdf[i]);
// loop over the pages in that document
int n = reader.NumberOfPages;
for (int page = 0; page < n; )
{
copy.AddPage(copy.GetImportedPage(reader, ++page));
}
}
}
}
mergedPdf = ms.ToArray();
}
Here pdf
can either be defined as a List<byte[]>
immediately containing the source documents (appropriate for your use case of merging intermediate in-memory documents) or as a List<String>
containing the names of source document files (appropriate if you merge documents from disk).
An overview at the end of the referenced chapter summarizes the usage of the classes mentioned:
PdfCopy
: Copies pages from one or more existing PDF documents. Major downsides:PdfCopy
doesn’t detect redundant content, and it fails when concatenating forms.PdfCopyFields
: Puts the fields of the different forms into one form. Can be used to avoid the problems encountered with form fields when concatenating forms usingPdfCopy
. Memory use can be an issue.PdfSmartCopy
: Copies pages from one or more existing PDF documents.PdfSmartCopy
is able to detect redundant content, but it needs more memory and CPU thanPdfCopy
.PdfWriter
: Generates PDF documents from scratch. Can import pages from other PDF documents. The major downside is that all interactive features of the imported page (annotations, bookmarks, fields, and so forth) are lost in the process.
Merge / convert multiple PDF files into one PDF
I'm sorry, I managed to find the answer myself using google and a bit of luck : )
For those interested;
I installed the pdftk (pdf toolkit) on our debian server, and using the following command I achieved desired output:
pdftk file1.pdf file2.pdf cat output output.pdf
OR
gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf file1.pdf file2.pdf file3.pdf ...
This in turn can be piped directly into pdf2ps.
Merge PDF files
Use Pypdf or its successor PyPDF2:
A Pure-Python library built as a PDF toolkit. It is capable of:
- splitting documents page by page,
- merging documents page by page,
(and much more)
Here's a sample program that works with both versions.
#!/usr/bin/env python
import sys
try:
from PyPDF2 import PdfFileReader, PdfFileWriter
except ImportError:
from pyPdf import PdfFileReader, PdfFileWriter
def pdf_cat(input_files, output_stream):
input_streams = []
try:
# First open all the files, then produce the output file, and
# finally close the input files. This is necessary because
# the data isn't read from the input files until the write
# operation. Thanks to
# https://stackoverflow.com/questions/6773631/problem-with-closing-python-pypdf-writing-getting-a-valueerror-i-o-operation/6773733#6773733
for input_file in input_files:
input_streams.append(open(input_file, 'rb'))
writer = PdfFileWriter()
for reader in map(PdfFileReader, input_streams):
for n in range(reader.getNumPages()):
writer.addPage(reader.getPage(n))
writer.write(output_stream)
finally:
for f in input_streams:
f.close()
output_stream.close()
if __name__ == '__main__':
if sys.platform == "win32":
import os, msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
pdf_cat(sys.argv[1:], sys.stdout)
Combine two (or more) PDF's
I had to solve a similar problem and what I ended up doing was creating a small pdfmerge utility that uses the PDFSharp project which is essentially MIT licensed.
The code is dead simple, I needed a cmdline utility so I have more code dedicated to parsing the arguments than I do for the PDF merging:
using (PdfDocument one = PdfReader.Open("file1.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument two = PdfReader.Open("file2.pdf", PdfDocumentOpenMode.Import))
using (PdfDocument outPdf = new PdfDocument())
{
CopyPages(one, outPdf);
CopyPages(two, outPdf);
outPdf.Save("file1and2.pdf");
}
void CopyPages(PdfDocument from, PdfDocument to)
{
for (int i = 0; i < from.PageCount; i++)
{
to.AddPage(from.Pages[i]);
}
}
Merging multiple PDFs using iTextSharp in c#.net
I found the answer:
Instead of the 2nd Method, add more files to the first array of input files.
public static void CombineMultiplePDFs(string[] fileNames, string outFile)
{
// step 1: creation of a document-object
Document document = new Document();
//create newFileStream object which will be disposed at the end
using (FileStream newFileStream = new FileStream(outFile, FileMode.Create))
{
// step 2: we create a writer that listens to the document
PdfCopy writer = new PdfCopy(document, newFileStream);
// step 3: we open the document
document.Open();
foreach (string fileName in fileNames)
{
// we create a reader for a certain document
PdfReader reader = new PdfReader(fileName);
reader.ConsolidateNamedDestinations();
// step 4: we add content
for (int i = 1; i <= reader.NumberOfPages; i++)
{
PdfImportedPage page = writer.GetImportedPage(reader, i);
writer.AddPage(page);
}
PRAcroForm form = reader.AcroForm;
if (form != null)
{
writer.CopyAcroForm(reader);
}
reader.Close();
}
// step 5: we close the document and writer
writer.Close();
document.Close();
}//disposes the newFileStream object
}
Is there a faster way to merge two files rather than page by page?
I do not have enough 'reputation' to comment. But since I was going to post an answer I made it long.
Normally when people want to 'merge' documents they mean 'combining' them, or as you point out, concatenate or append one pdf at the end of the other (or somewhere in between). But based on the code you present, it seems you meant overlaying one pdf over another, right? Or in other words, you want page 1 from both pdf1 and pdf2 to be combined in to a single page as part of a new pdf.
If so, you could use this (modified from example used to illustrate watermarking). It is still overlaying one page at a time. But, pdfrw is known to be super fast compared to PyPDF2 and supposed to work well with reportlab. I havent compared the speeds, so not sure if this will actually be faster than what you already have
from pdfrw import PdfReader, PdfWriter, PageMerge
p1 = pdfrw.PdfReader("file1")
p2 = pdfrw.PdfReader("file2")
for page in range(len(p1.pages)):
merger = PageMerge(p1.pages[page])
merger.add(p2.pages[page]).render()
writer = PdfWriter()
writer.write("output.pdf", p1)
How to merge two PDF files into one in Java?
Why not use the PDFMergerUtility of pdfbox?
PDFMergerUtility ut = new PDFMergerUtility();
ut.addSource(...);
ut.addSource(...);
ut.addSource(...);
ut.setDestinationFileName(...);
ut.mergeDocuments();
Related Topics
How to Mock Out the File System in C# for Unit Testing
Centering Controls Within a Form in .Net (Winforms)
Programmatically Add Column & Rows to Wpf Datagrid
Showing Difference Between Two Datetime Values in Hours
How Can a Word Document Be Created in C#
Select Multiple Items from a Datagrid in an Mvvm Wpf Project
Validateantiforgerytoken Purpose, Explanation and Example
How to Make Databinding Type Safe and Support Refactoring
Using Side-By-Side Assemblies to Load the X64 or X32 Version of a Dll
How to Check If Wpf Is Currently Executing in Design Mode or Not
Count the Items from a Ienumerable<T> Without Iterating
Find If Current Time Falls in a Time Range