Merge/Convert Multiple Pdf Files into One Pdf

Merge / convert multiple PDF files into one PDF

I'm sorry, I managed to find the answer myself using google and a bit of luck : )

For those interested;

I installed the pdftk (pdf toolkit) on our debian server, and using the following command I achieved desired output:

pdftk file1.pdf file2.pdf cat output output.pdf

OR

gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf file1.pdf file2.pdf file3.pdf ...

This in turn can be piped directly into pdf2ps.

Merge PDF pages to 1 file without generating single page files

You need to use BytesIO:

for fileset in filesets:
merger = PdfFileMerger()
page_path = fr".\output\pages"
for file in fileset:
# Load image, read with pytesseract
path = os.path.join(download_location,file)
img = cv2.imread(path,1)
result = pytesseract.image_to_pdf_or_hocr(img, lang="eng",config=tessdata_dir_config)
merger.append(BytesIO(result))

merger.write(fr".\output\{FILE}.pdf")

Merging Many PDFs into One PDF

On this line

 merger.append(path + pdf_files)

You wanted

 merger.append(path + files)

How to merge many PDF files into a single one?

You can use http://www.mergepdf.net/ for example

Or:

PDFTK http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/

If you are NOT on Ubuntu and you have the same problem (and you wanted to start a new topic on SO and SO suggested to have a look at this question) you can also do it like this:

Things You'll Need:

* Full Version of Adobe Acrobat
  1. Open all the .pdf files you wish to merge. These can be minimized on your desktop as individual tabs.

  2. Pull up what you wish to be the first page of your merged document.

  3. Click the 'Combine Files' icon on the top left portion of the screen.

  4. The 'Combine Files' window that pops up is divided into three sections. The first section is titled, 'Choose the files you wish to combine'. Select the 'Add Open Files' option.

  5. Select the other open .pdf documents on your desktop when prompted.

  6. Rearrange the documents as you wish in the second window, titled, 'Arrange the files in the order you want them to appear in the new PDF'

  7. The final window, titled, 'Choose a file size and conversion setting' allows you to control the size of your merged PDF document. Consider the purpose of your new document. If its to be sent as an e-mail attachment, use a low size setting. If the PDF contains images or is to be used for presentation, choose a high setting. When finished, select 'Next'.

  8. A final choice: choose between either a single PDF document, or a PDF package, which comes with the option of creating a specialized cover sheet. When finished, hit 'Create', and save to your preferred location.

    • Tips & Warnings

Double check the PDF documents prior to merging to make sure all pertinent information is included. Its much easier to re-create a single PDF page than a multi-page document.

merge many pdf files into one pdf files in web application java

There are numerous errors in your code:

Only write to the response output stream what you want to return to the browser

Your code writes a wild collection of data to the response output stream:

ServletOutputStream servletOutPutStream = response.getOutputStream();;
[...]
for(byte[] imageList:imageMap)
{
[...]
byteArrayOutputStream.writeTo(response.getOutputStream());
[...]
}
[...]
PdfWriter writer = PdfWriter.getInstance(document, response.getOutputStream());
[... merge PDFs into the writer]

servletOutPutStream.flush();
document.close();

servletOutPutStream.close();

This results in many copies of the imageMap elements to be written there and the merged file only to be added thereafter.

What do you expect the browser to do, ignore all the leading source PDF copies until finally the merged PDF appears?

Thus, please only write the merged PDF to the response output stream.

Don't write a wrong content length

It is a good idea to write the content length to the response... but only if you use the correct value!

In your code you write a content length:

response.setContentLength(byteArrayOutputStream.size());

but the byteArrayOutputStream at this time only contains a wild mix of copies of the source PDFs and not yet the final merged PDF. Thus, this will only serve to confuse the browser even more.

Thus, please do not add false headers to the response.

Don't mangle your input data

In the loop

for(byte[] imageList:imageMap)
{
System.out.println(imageList.toString()+" "+imageList.length);

byteArrayOutputStream.write(imageList);

byteArrayOutputStream.writeTo(response.getOutputStream());

is = new ByteArrayInputStream(byteArrayOutputStream.toByteArray());
inputPdfList.add(is);
}

you take byte arrays which I assume contain a single source PDF each, pollute the response output stream with them (as mentioned before), and create a collection of input streams where the first one contains the first source PDF, the second one contains the concatenation of the first two source PDFs, the third one the concatenation of the first three source PDFs, etc...

Because you never reset or re-instantiate the byteArrayOutputStream, it only gets bigger and bigger.

Thus, please start or end loops like this with a reset of the byteArrayOutputStream.

(Actually you don't need that loop at all, the PdfReader has a constructor which can immediately take a byte[], no need to wrap it in a byte stream.)

Don't merge PDFs using a plain PdfWriter, use a PdfCopy

You merge the PDFs using a PdfWriter / getImportedPage / addTemplate approach. There are dozens of questions and answer on stack overflow (many of them answered by iText developers) explaining that this usually is a bad idea and that you should use PdfCopy.

Thus, please make use of the many good answers which already exist on this topic here and use PdfCopy for merging.

Don't flush or close streams only because you can

You finalize the response output by closing numerous streams:

//Close document and outputStream.
servletOutPutStream.flush();
outputStream.flush();
document.close();
outputStream.close();

servletOutPutStream.close();

I have not seen a line in which you declared or set that outputStream variable, but even if it contained the response output stream, there is no need to close that because you already close it in the servletOutPutStream variable.

Thus, please remove unnecessary calls like this.



Related Topics



Leave a reply



Submit