Merging pdf files with bookmarks
Using itextsharp you can do it. I do it by the following method:
MergePdfFiles(string outputPdf, string[] sourcePdfs) {
PdfReader reader = null;
Document document = new Document();
PdfImportedPage page = null;
PdfCopy pdfCpy = null;
int n = 0;
int totalPages = 0;
int page_offset = 0;
List < Dictionary < string, object >> bookmarks = new List < Dictionary < string, object >> ();
IList < Dictionary < string, object >> tempBookmarks;
for (int i = 0; i <= sourcePdfs.GetUpperBound(0); i++) {
reader = new PdfReader(sourcePdfs[i]);
reader.ConsolidateNamedDestinations();
n = reader.NumberOfPages;
tempBookmarks = SimpleBookmark.GetBookmark(reader);
if (i == 0) {
document = new iTextSharp.text.Document(reader.GetPageSizeWithRotation(1));
pdfCpy = new PdfCopy(document, new FileStream(outputPdf, FileMode.Create));
document.Open();
SimpleBookmark.ShiftPageNumbers(tempBookmarks, page_offset, null);
page_offset += n;
if (tempBookmarks != null)
bookmarks.AddRange(tempBookmarks);
// MessageBox.Show(n.ToString());
totalPages = n;
} else {
SimpleBookmark.ShiftPageNumbers(tempBookmarks, page_offset, null);
if (tempBookmarks != null)
bookmarks.AddRange(tempBookmarks);
page_offset += n;
totalPages += n;
}
for (int j = 1; j <= n; j++) {
page = pdfCpy.GetImportedPage(reader, j);
pdfCpy.AddPage(page);
}
reader.Close();
}
pdfCpy.Outlines = bookmarks;
document.Close();
}
Merging .pdf files with Pdftk
EDIT New approach.
- If you drag'n drop file(s) or a folder to the batch or pass at least one file/folder
- the following batch will change to the referenced folder and
- processes all pdf files in that folder combining them into binder.pdf
- an eventually existing binder.pdf is renamed to binder.bak.pdf
:: Q:\Test\2018\06\06\SO_50728273.cmd
@echo off
setlocal enabledelayedexpansion
if "%~1" neq "" (
Echo %~a1|findstr "d" 2>&1>Nul && Pushd "%~f1" || Pushd "%~dp1"
) else (
Echo No arguments, need a path& pause & goto :Eof
)
Del /f binder.bak.pdf 2>&1>Nul
if exist binder.pdf Ren binder.pdf binder.bak.pdf
pdftk.exe *.pdf cat output binder.pdf
PopD
Without knowing what arguments you pass to the batch diagnosing is impossible.%* is replaced with all arguments you pass, the location of the output is determined by the path of the first argument %~dp1
I ran your batch on my ramdisk a:
Dir before:
> dir A:\
Verzeichnis von A:\
2018-06-06 21:57 65.381 SO_5072812.pdf
2018-06-06 21:56 163 SO_50728273.cmd
2018-06-06 21:55 60.649 SO_50728273.pdf
3 Datei(en), 126.193 Bytes
0 Verzeichnis(se), 1.049.452.544 Bytes frei
And after (I named the batch SO_50728273.cmd
):
> SO_50728273.cmd a:\*.pdf
> dir
Verzeichnis von A:\
2018-06-06 21:58 125.756 binder.pdf
2018-06-06 21:57 65.381 SO_5072812.pdf
2018-06-06 21:56 163 SO_50728273.cmd
2018-06-06 21:55 60.649 SO_50728273.pdf
4 Datei(en), 251.949 Bytes
0 Verzeichnis(se), 1.049.260.032 Bytes frei
Merging PDFs while retaining custom page numbers (aka pagelabels) and bookmarks
You need to iterate through the existing PageLabels
and add them to the merged output, taking care to add an offset to the page index entry, based on the number of pages already added.
This solution also requires PyPDF4
, since PyPDF2
produces a weird error (see bottom).
from PyPDF4 import PdfFileWriter, PdfFileMerger, PdfFileReader
# To manipulate the PDF dictionary
import PyPDF4.pdf as PDF
import logging
def add_nums(num_entry, page_offset, nums_array):
for num in num_entry['/Nums']:
if isinstance(num, (int)):
logging.debug("Found page number %s, offset %s: ", num, page_offset)
# Add the physical page information
nums_array.append(PDF.NumberObject(num+page_offset))
else:
# {'/S': '/r'}, or {'/S': '/D', '/St': 489}
keys = num.keys()
logging.debug("Found page label, keys: %s", keys)
number_type = PDF.DictionaryObject()
# Always copy the /S entry
s_entry = num['/S']
number_type.update({PDF.NameObject("/S"): PDF.NameObject(s_entry)})
logging.debug("Adding /S entry: %s", s_entry)
if '/St' in keys:
# If there is an /St entry, fetch it
pdf_label_offset = num['/St']
# and add the new offset to it
logging.debug("Found /St %s", pdf_label_offset)
number_type.update({PDF.NameObject("/St"): PDF.NumberObject(pdf_label_offset)})
# Add the label information
nums_array.append(number_type)
return nums_array
def write_merged(pdf_readers):
# Output
merger = PdfFileMerger()
# For PageLabels information
page_labels = []
page_offset = 0
nums_array = PDF.ArrayObject()
# Iterate through all the inputs
for pdf_reader in pdf_readers:
try:
# Merge the content
merger.append(pdf_reader)
# Handle the PageLabels
# Fetch page information
old_page_labels = pdf_reader.trailer['/Root']['/PageLabels']
page_count = pdf_reader.getNumPages()
# Add PageLabel information
add_nums(old_page_labels, page_offset, nums_array)
page_offset = page_offset + page_count
except Exception as err:
print("ERROR: %s" % err)
# Add PageLabels
page_numbers = PDF.DictionaryObject()
page_numbers.update({PDF.NameObject("/Nums"): nums_array})
page_labels = PDF.DictionaryObject()
page_labels.update({PDF.NameObject("/PageLabels"): page_numbers})
root_obj = merger.output._root_object
root_obj.update(page_labels)
# Write output
merger.write('merged.pdf')
pdf_readers = []
tmp1 = PdfFileReader('file1.pdf', 'rb')
tmp2 = PdfFileReader('file2.pdf', 'rb')
pdf_readers.append(tmp1)
pdf_readers.append(tmp2)
write_merged(pdf_readers)
Note: PyPDF2 produces this weird error:
...
...
File "/usr/lib/python3/dist-packages/PyPDF2/pdf.py", line 552, in _sweepIndirectReferences
data[key] = value
File "/usr/lib/python3/dist-packages/PyPDF2/generic.py", line 507, in __setitem__
raise ValueError("key must be PdfObject")
ValueError: key must be PdfObject
Merge / convert multiple PDF files into one PDF
I'm sorry, I managed to find the answer myself using google and a bit of luck : )
For those interested;
I installed the pdftk (pdf toolkit) on our debian server, and using the following command I achieved desired output:
pdftk file1.pdf file2.pdf cat output output.pdf
OR
gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf file1.pdf file2.pdf file3.pdf ...
This in turn can be piped directly into pdf2ps.
Related Topics
How to Set a Custom Baud Rate on Linux
What Is Path on a MAC (Unix) System
Can Docker Solve a Problem of Mismatched C Shared Libraries
How to Modify the Source of Buildroot Packages for Package Development
Library Path When Dynamically Loaded
How to Download a File from Server Using Ssh
How to Recall the Argument of the Previous Bash Command
How to Read the Source Code of Shell Commands
Linux Equivalent of the MAC Os X "Open" Command
What Is the Purpose of the "-I" and "-T" Options for the "Docker Exec" Command
Why Linux/Gnu Linker Chose Address 0X400000
Bash Script to Remove 'X' Amount of Characters the End of Multiple Filenames in a Directory
How to Export a Variable in Bash
Ioctl VS Netlink VS Memmap to Communicate Between Kernel Space and User Space
Application Control of Tcp Retransmission on Linux
Extracting Columns from Text File with Different Delimiters in Linux
Using 'Date' Command to Get Previous, Current and Next Month