Merge PDF's with PDFtk with Bookmarks

Merging pdf files with bookmarks

Using itextsharp you can do it. I do it by the following method:

MergePdfFiles(string outputPdf, string[] sourcePdfs) {
    PdfReader reader = null;
    Document document = new Document();
    PdfImportedPage page = null;
    PdfCopy pdfCpy = null;
    int n = 0;
    int totalPages = 0;
    int page_offset = 0;
    List < Dictionary < string, object >> bookmarks = new List < Dictionary < string, object >> ();
    IList < Dictionary < string, object >> tempBookmarks;
    for (int i = 0; i <= sourcePdfs.GetUpperBound(0); i++) {
        reader = new PdfReader(sourcePdfs[i]);
        reader.ConsolidateNamedDestinations();
        n = reader.NumberOfPages;
        tempBookmarks = SimpleBookmark.GetBookmark(reader);
        if (i == 0) {
            document = new iTextSharp.text.Document(reader.GetPageSizeWithRotation(1));
            pdfCpy = new PdfCopy(document, new FileStream(outputPdf, FileMode.Create));
            document.Open();
            SimpleBookmark.ShiftPageNumbers(tempBookmarks, page_offset, null);
            page_offset += n;
            if (tempBookmarks != null)
                bookmarks.AddRange(tempBookmarks);
            //  MessageBox.Show(n.ToString());
            totalPages = n;
        } else {
            SimpleBookmark.ShiftPageNumbers(tempBookmarks, page_offset, null);
            if (tempBookmarks != null)
                bookmarks.AddRange(tempBookmarks);
            page_offset += n;
            totalPages += n;
        }
        for (int j = 1; j <= n; j++) {
            page = pdfCpy.GetImportedPage(reader, j);
            pdfCpy.AddPage(page);
        }
        reader.Close();
    }
    pdfCpy.Outlines = bookmarks;
    document.Close();
}

Merging .pdf files with Pdftk

EDIT New approach.

If you drag'n drop file(s) or a folder to the batch or pass at least one file/folder
the following batch will change to the referenced folder and
processes all pdf files in that folder combining them into binder.pdf
an eventually existing binder.pdf is renamed to binder.bak.pdf

:: Q:\Test\2018\06\06\SO_50728273.cmd
@echo off
setlocal enabledelayedexpansion
if "%~1" neq "" (
  Echo %~a1|findstr "d" 2>&1>Nul && Pushd "%~f1" || Pushd "%~dp1"
) else (
  Echo No arguments, need a path& pause & goto :Eof
)
Del /f binder.bak.pdf 2>&1>Nul
if exist binder.pdf Ren binder.pdf binder.bak.pdf
pdftk.exe *.pdf cat output binder.pdf
PopD

Without knowing what arguments you pass to the batch diagnosing is impossible.%* is replaced with all arguments you pass, the location of the output is determined by the path of the first argument %~dp1

I ran your batch on my ramdisk a:

Dir before:

> dir A:\
 Verzeichnis von A:\

2018-06-06  21:57            65.381 SO_5072812.pdf
2018-06-06  21:56               163 SO_50728273.cmd
2018-06-06  21:55            60.649 SO_50728273.pdf
               3 Datei(en),        126.193 Bytes
               0 Verzeichnis(se),  1.049.452.544 Bytes frei

And after (I named the batch SO_50728273.cmd):

> SO_50728273.cmd a:\*.pdf

> dir
 Verzeichnis von A:\

2018-06-06  21:58           125.756 binder.pdf
2018-06-06  21:57            65.381 SO_5072812.pdf
2018-06-06  21:56               163 SO_50728273.cmd
2018-06-06  21:55            60.649 SO_50728273.pdf
               4 Datei(en),        251.949 Bytes
               0 Verzeichnis(se),  1.049.260.032 Bytes frei

Merging PDFs while retaining custom page numbers (aka pagelabels) and bookmarks

You need to iterate through the existing PageLabels and add them to the merged output, taking care to add an offset to the page index entry, based on the number of pages already added.

This solution also requires PyPDF4, since PyPDF2 produces a weird error (see bottom).

from PyPDF4 import PdfFileWriter, PdfFileMerger, PdfFileReader 

# To manipulate the PDF dictionary
import PyPDF4.pdf as PDF

import logging

def add_nums(num_entry, page_offset, nums_array):
    for num in num_entry['/Nums']:
        if isinstance(num, (int)):
            logging.debug("Found page number %s, offset %s: ", num, page_offset)

            # Add the physical page information
            nums_array.append(PDF.NumberObject(num+page_offset))
        else:
            # {'/S': '/r'}, or {'/S': '/D', '/St': 489}
            keys = num.keys()
            logging.debug("Found page label, keys: %s", keys)
            number_type = PDF.DictionaryObject()
            # Always copy the /S entry
            s_entry = num['/S']
            number_type.update({PDF.NameObject("/S"): PDF.NameObject(s_entry)})
            logging.debug("Adding /S entry: %s", s_entry)

            if '/St' in keys:
                # If there is an /St entry, fetch it
                pdf_label_offset = num['/St']
                # and add the new offset to it
                logging.debug("Found /St %s", pdf_label_offset)
                number_type.update({PDF.NameObject("/St"): PDF.NumberObject(pdf_label_offset)})

            # Add the label information
            nums_array.append(number_type)

    return nums_array

def write_merged(pdf_readers):
    # Output
    merger = PdfFileMerger()

    # For PageLabels information
    page_labels = []
    page_offset = 0
    nums_array = PDF.ArrayObject()

    # Iterate through all the inputs
    for pdf_reader in pdf_readers:
        try:
            # Merge the content
            merger.append(pdf_reader)

            # Handle the PageLabels
            # Fetch page information
            old_page_labels = pdf_reader.trailer['/Root']['/PageLabels']
            page_count = pdf_reader.getNumPages()

            # Add PageLabel information
            add_nums(old_page_labels, page_offset, nums_array)
            page_offset = page_offset + page_count

        except Exception as err:
            print("ERROR: %s" % err)

    # Add PageLabels
    page_numbers = PDF.DictionaryObject()
    page_numbers.update({PDF.NameObject("/Nums"): nums_array})

    page_labels = PDF.DictionaryObject()
    page_labels.update({PDF.NameObject("/PageLabels"): page_numbers})

    root_obj = merger.output._root_object
    root_obj.update(page_labels)

    # Write output
    merger.write('merged.pdf')


pdf_readers = []
tmp1 = PdfFileReader('file1.pdf', 'rb')
tmp2 = PdfFileReader('file2.pdf', 'rb')
pdf_readers.append(tmp1)
pdf_readers.append(tmp2)

write_merged(pdf_readers)

Note: PyPDF2 produces this weird error:

  ...
  ...
  File "/usr/lib/python3/dist-packages/PyPDF2/pdf.py", line 552, in _sweepIndirectReferences
    data[key] = value
  File "/usr/lib/python3/dist-packages/PyPDF2/generic.py", line 507, in __setitem__
    raise ValueError("key must be PdfObject")
ValueError: key must be PdfObject

Merge / convert multiple PDF files into one PDF

I'm sorry, I managed to find the answer myself using google and a bit of luck : )

For those interested;

I installed the pdftk (pdf toolkit) on our debian server, and using the following command I achieved desired output:

pdftk file1.pdf file2.pdf cat output output.pdf

gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf file1.pdf file2.pdf file3.pdf ...

This in turn can be piped directly into pdf2ps.

Merge PDF's with PDFtk with Bookmarks

Merging pdf files with bookmarks

Merging .pdf files with Pdftk

Merging PDFs while retaining custom page numbers (aka pagelabels) and bookmarks

Merge / convert multiple PDF files into one PDF

Related Topics

Leave a reply