Merge / convert multiple PDF files into one PDF
I'm sorry, I managed to find the answer myself using google and a bit of luck : )
For those interested;
I installed the pdftk (pdf toolkit) on our debian server, and using the following command I achieved desired output:
pdftk file1.pdf file2.pdf cat output output.pdf
OR
gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf file1.pdf file2.pdf file3.pdf ...
This in turn can be piped directly into pdf2ps.
Merge PDF pages to 1 file without generating single page files
You need to use BytesIO:
for fileset in filesets:
merger = PdfFileMerger()
page_path = fr".\output\pages"
for file in fileset:
# Load image, read with pytesseract
path = os.path.join(download_location,file)
img = cv2.imread(path,1)
result = pytesseract.image_to_pdf_or_hocr(img, lang="eng",config=tessdata_dir_config)
merger.append(BytesIO(result))
merger.write(fr".\output\{FILE}.pdf")
Merging Many PDFs into One PDF
On this line
merger.append(path + pdf_files)
You wanted
merger.append(path + files)
How to merge many PDF files into a single one?
You can use http://www.mergepdf.net/ for example
Or:
PDFTK http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
If you are NOT on Ubuntu and you have the same problem (and you wanted to start a new topic on SO and SO suggested to have a look at this question) you can also do it like this:
Things You'll Need:
* Full Version of Adobe Acrobat
Open all the .pdf files you wish to merge. These can be minimized on your desktop as individual tabs.
Pull up what you wish to be the first page of your merged document.
Click the 'Combine Files' icon on the top left portion of the screen.
The 'Combine Files' window that pops up is divided into three sections. The first section is titled, 'Choose the files you wish to combine'. Select the 'Add Open Files' option.
Select the other open .pdf documents on your desktop when prompted.
Rearrange the documents as you wish in the second window, titled, 'Arrange the files in the order you want them to appear in the new PDF'
The final window, titled, 'Choose a file size and conversion setting' allows you to control the size of your merged PDF document. Consider the purpose of your new document. If its to be sent as an e-mail attachment, use a low size setting. If the PDF contains images or is to be used for presentation, choose a high setting. When finished, select 'Next'.
A final choice: choose between either a single PDF document, or a PDF package, which comes with the option of creating a specialized cover sheet. When finished, hit 'Create', and save to your preferred location.
- Tips & Warnings
Double check the PDF documents prior to merging to make sure all pertinent information is included. Its much easier to re-create a single PDF page than a multi-page document.
merge many pdf files into one pdf files in web application java
There are numerous errors in your code:
Only write to the response output stream what you want to return to the browser
Your code writes a wild collection of data to the response output stream:
ServletOutputStream servletOutPutStream = response.getOutputStream();;
[...]
for(byte[] imageList:imageMap)
{
[...]
byteArrayOutputStream.writeTo(response.getOutputStream());
[...]
}
[...]
PdfWriter writer = PdfWriter.getInstance(document, response.getOutputStream());
[... merge PDFs into the writer]
servletOutPutStream.flush();
document.close();
servletOutPutStream.close();
This results in many copies of the imageMap
elements to be written there and the merged file only to be added thereafter.
What do you expect the browser to do, ignore all the leading source PDF copies until finally the merged PDF appears?
Thus, please only write the merged PDF to the response output stream.
Don't write a wrong content length
It is a good idea to write the content length to the response... but only if you use the correct value!
In your code you write a content length:
response.setContentLength(byteArrayOutputStream.size());
but the byteArrayOutputStream
at this time only contains a wild mix of copies of the source PDFs and not yet the final merged PDF. Thus, this will only serve to confuse the browser even more.
Thus, please do not add false headers to the response.
Don't mangle your input data
In the loop
for(byte[] imageList:imageMap)
{
System.out.println(imageList.toString()+" "+imageList.length);
byteArrayOutputStream.write(imageList);
byteArrayOutputStream.writeTo(response.getOutputStream());
is = new ByteArrayInputStream(byteArrayOutputStream.toByteArray());
inputPdfList.add(is);
}
you take byte
arrays which I assume contain a single source PDF each, pollute the response output stream with them (as mentioned before), and create a collection of input streams where the first one contains the first source PDF, the second one contains the concatenation of the first two source PDFs, the third one the concatenation of the first three source PDFs, etc...
Because you never reset or re-instantiate the byteArrayOutputStream
, it only gets bigger and bigger.
Thus, please start or end loops like this with a reset of the byteArrayOutputStream
.
(Actually you don't need that loop at all, the PdfReader
has a constructor which can immediately take a byte[]
, no need to wrap it in a byte stream.)
Don't merge PDFs using a plain PdfWriter
, use a PdfCopy
You merge the PDFs using a PdfWriter
/ getImportedPage
/ addTemplate
approach. There are dozens of questions and answer on stack overflow (many of them answered by iText developers) explaining that this usually is a bad idea and that you should use PdfCopy
.
Thus, please make use of the many good answers which already exist on this topic here and use PdfCopy
for merging.
Don't flush or close streams only because you can
You finalize the response output by closing numerous streams:
//Close document and outputStream.
servletOutPutStream.flush();
outputStream.flush();
document.close();
outputStream.close();
servletOutPutStream.close();
I have not seen a line in which you declared or set that outputStream
variable, but even if it contained the response output stream, there is no need to close that because you already close it in the servletOutPutStream
variable.
Thus, please remove unnecessary calls like this.
Related Topics
Recursively Counting Files in a Linux Directory
How to Use 'Cp' Command to Exclude a Specific Directory
How to Find All Serial Devices (Ttys, Ttyusb, ..) on Linux Without Opening Them
How to Create a File With a Given Size in Linux
Run an Ansible Task Only When the Variable Contains a Specific String
How to Use Sed to Extract Substring
How to Shutdown a Spring Boot Application in a Correct Way
Bash Ignoring Error for a Particular Command
How to Fix Java.Lang.Module.Findexception: Module Java.Se.Ee Not Found
How to Access Physical Addresses from User Space in Linux
Shell Script: Run Function from Script Over Ssh
Forcing Bash to Expand Variables in a String Loaded from a File
Asynchronous Io Io_Submit Latency in Ubuntu Linux
Syntax Error in Shell Script With Process Substitution
Allowed Characters in Linux Environment Variable Names