Merge pdf files with numerical sort
you can embed the result of command using $()
,
so you can do following
$ pdfunite $(ls -v *.pdf) output.pdf
or
$ pdfunite $(ls *.pdf | sort -n) output.pdf
However, note that this does not work when filename contains special character such as whitespace.
In the case you can do the following:
ls -v *.txt | bash -c 'IFS=$'"'"'\n'"'"' read -d "" -ra x;pdfunite "${x[@]}" output.pdf'
Although it seems a little bit complicated, its just combination of
- Bash: Read tab-separated file line into array
- build argument lists containing whitespace
- How to escape single-quotes within single-quoted strings?
Note that you cannot use xargs
since pdfunite
requires input pdf's as the middle of arguments.
I avoided using readarray
since it is not supported in older bash version, but you can use it instead of IFS=.. read -ra ..
if you have newer bash
.
merge multiple pdfs in order
It is because of naming of files. Your codenew FileOutputStream(outputfolder + "\\" + "tempcontrat" + debut + "-" + i + "_.pdf")
will produce:
- tempcontrat0-0_.pdf
- tempcontrat0-1_.pdf
- ...
- tempcontrat0-10_.pdf
- tempcontrat0-11_.pdf
- ...
- tempcontrat0-1000_.pdf
Where tempcontrat0-1000_.pdf will be placed before tempcontrat0-11_.pdf, because you are sorting it alphabetically before merge.
It will be better to left pad file number with 0
character using leftPad() method of org.apache.commons.lang.StringUtils
or java.text.DecimalFormat
and have it like this tempcontrat0-000000.pdf, tempcontrat0-000001.pdf, ... tempcontrat0-9999999.pdf.
And you can also do it much simpler and skip writing into file and then reading from file steps and merge documents right after the form fill and it will be faster. But it depends how many and how big documents you are merging and how much memory do you have.
So you can save the filled document into ByteArrayOutputStream
and after stamper.close()
create new PdfReader
for bytes from that stream and call pdfSmartCopy.getImportedPage()
for that reader. In short cut it can look like:
// initialize
PdfSmartCopy pdfSmartCopy = new PdfSmartCopy(document, memoryStream);
for (int i = debut; i < fin; i++) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
// fill in the form here
stamper.close();
PdfReader reader = new PdfReader(out.toByteArray());
reader.consolidateNamedDestinations();
PdfImportedPage pdfImportedPage = pdfSmartCopy.getImportedPage(reader, 1);
pdfSmartCopy.addPage(pdfImportedPage);
// other actions ...
}
Merge / convert multiple PDF files into one PDF
I'm sorry, I managed to find the answer myself using google and a bit of luck : )
For those interested;
I installed the pdftk (pdf toolkit) on our debian server, and using the following command I achieved desired output:
pdftk file1.pdf file2.pdf cat output output.pdf
OR
gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf file1.pdf file2.pdf file3.pdf ...
This in turn can be piped directly into pdf2ps.
Merge PDF files
Use Pypdf or its successor PyPDF2:
A Pure-Python library built as a PDF toolkit. It is capable of:
- splitting documents page by page,
- merging documents page by page,
(and much more)
Here's a sample program that works with both versions.
#!/usr/bin/env python
import sys
try:
from PyPDF2 import PdfFileReader, PdfFileWriter
except ImportError:
from pyPdf import PdfFileReader, PdfFileWriter
def pdf_cat(input_files, output_stream):
input_streams = []
try:
# First open all the files, then produce the output file, and
# finally close the input files. This is necessary because
# the data isn't read from the input files until the write
# operation. Thanks to
# https://stackoverflow.com/questions/6773631/problem-with-closing-python-pypdf-writing-getting-a-valueerror-i-o-operation/6773733#6773733
for input_file in input_files:
input_streams.append(open(input_file, 'rb'))
writer = PdfFileWriter()
for reader in map(PdfFileReader, input_streams):
for n in range(reader.getNumPages()):
writer.addPage(reader.getPage(n))
writer.write(output_stream)
finally:
for f in input_streams:
f.close()
output_stream.close()
if __name__ == '__main__':
if sys.platform == "win32":
import os, msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
pdf_cat(sys.argv[1:], sys.stdout)
Python - Merge PDF files with same prefix using PyPDF2
os.listdir()
only lists filenames; it won't include the directory name.
To get the full path to actually add into the merger, you'll have to os.path.join()
the root path back in.
However, you'll also need to note that the files you get from os.listdir()
may not necessarily be in the order you want for your prefixes, so it'd be better to refactor things so you first group things by prefix, then process each prefix group:
from collections import defaultdict
from PyPDF2 import PdfFileMerger
import os
root_path = "C:\\test\\raw"
result_path = "C:\\test\\result"
files_by_prefix = defaultdict(list)
for filename in os.listdir(root_path):
prefix = filename.split("_")[2]
files_by_prefix[prefix].append(filename)
for prefix, filenames in files_by_prefix.items():
result_name = os.path.join(result_path, prefix + "_merged.pdf")
print(f"Merging {filenames} to {result_name} (prefix {prefix})")
merger = PdfFileMerger()
for filename in sorted(filenames):
merger.append(os.path.join(root_path, filename))
merger.write(os.path.join(result_path, f"{prefix}_merged.pdf"))
merger.close()
Related Topics
How to Emulate the Raspberry Pi 2 on Qemu
How to Write Linux Driver Module Call/Use Another Driver Module
Awk One Liner Select Only Rows Based on Value of a Column
How to Sort Files Numerically from Linux Command Line
Selecting the Right Linux I/O Scheduler for a Host Equipped with Nvme Ssd
Using Find - Deleting All Files/Directories (In Linux ) Except Any One
Thread Utilization Profiling on Linux
Fast Concatenate Multiple Files on Linux
Apt-Get Error: Sub-Process /Usr/Bin/Dpkg Returned an Error Code (1)
Packet Mangling Utilities Besides Iptables
Anyway Change the Cursor "Vertical Line" Instead of a Box
How to Determine If Code Is Running in Signal-Handler Context
Percentage Value with Gnu Diff
Enable/Disable Tasks in Crontab by Bash/Shell
How to Check Status of Urls from Text File Using Bash Shell Script
How to Communicate with a Linux Kernel Module from User Space Without Littering /Dev with New Nodes
Undefined Reference to 'Clock_Gettime' Although '-Lrt' Is Given