Combine (Bind) Existing PDF Files in R

Combine (bind) existing pdf files in R

Here's how to do it with a minimal reproducible example. I believe you'll be able to pick it apart and figure out how to apply to your pdfs. The reports package isn't necessary but I like the use of folder and delete in my workflow so I used it here:

library(plotflow)
library(reports)

## make a folder to store the pdfs
folder(deleteMe)

## create a bunch of various sized pdfs
lapply(1:3, function(i) {
pdf(sprintf("deleteMe/test%s.pdf", i), width=sample(4:7, 1))
plot(1:10, 1:10, main = sprintf("Title: test%s.pdf", i))
dev.off()
})

## paste the paths to pdfs together in one string w/ spaces
plotflow:::mergePDF(
in.file=paste(file.path("deleteMe", dir("deleteMe")), collapse=" "),
file="merged.pdf"
)

## delete MWE
delete('deleteMe')

This was a helper function for within plotflow to aide work within R. I'd likely use gohstscript directly myself if I had the pdfs already.

R merge pdf into one pdf

Install pdftk and ensure it is on your path (or if not on your path use the full pathname when referring to it in the system command below). Then run the code below. No packages are used.

setwd("...directory where pdf files are located...") ##
infiles <- Sys.glob("148-*.pdf") ##
outfile <- "148.pdf" ##
system(paste("pdftk", paste(infiles, collapse = " "), "cat output", outfile))

There are some packages that provide wrappers around pdftk:

  • the staplr package. Unfortunately, it is probably not too useful here because it does not allow specification of the files or their order in the output -- one can only specify the input and output directories. ** Update ** The current version of staplr now allows specification of the files as mentioned in the comments.

  • the animation package. The pdftk command that this package provides is a slightly simpler alternative than using system with pdftk directly. For concatenating it would be the following where we assume that the 3 lines marked ## above have already been run.

    library(animation)
    ani.options(pdftk = "/path/to/pdftk") # or if on path: ani.options(pdftk = "pdftk")
    pdftk(infiles, "cat", outfile, "")

    This link has an example of using the animation package with pdftk to burst pages.

combine multiple pdf plots into one file

You can use sweave/knitr to get more flexibility and merge easily new plots ,old ones and texts:

\documentclass{article}
\usepackage{pdfpages}
\begin{document}
this my plot 1: % write some texts here
\includepdf{1.pdf}
this my plot 2:
\includepdf{2.pdf}
this my plot 3:
\includepdf{3.pdf}
this my plot 4:
\includepdf{4.pdf}
a new plot:
<<echo=FALSE>>= % chunk for new plots
x <- rnorm(100)
hist(x)
@
\end{document}

Combine multiple pages of .pdf into RMarkdown

I'm able to successfully include two different multi-page PDFs in your example document using pdfpages:

---
title: <center> <h1>Analysis Data</h1> </center>
mainfont: Arial
output:
pdf_document:
latex_engine: xelatex
sansfont: Arial
fig_crop: false
toc: true
classoption: landscape
fontsize: {10}
geometry: margin=0.30in
header-includes:
- \usepackage{booktabs}
- \usepackage{sectsty} \sectionfont{\centering}
- \renewcommand{\contentsname}{}\vspace{-2cm}
- \usepackage{pdfpages}
---

# File One

\includepdf[pages={-}]{pdf1.pdf}

\newpage

# File Two

\includepdf[pages={-}]{pdf2.pdf}

Merge PDF files

Use Pypdf or its successor PyPDF2:

A Pure-Python library built as a PDF toolkit. It is capable of:

  • splitting documents page by page,
  • merging documents page by page,

(and much more)

Here's a sample program that works with both versions.

#!/usr/bin/env python
import sys
try:
from PyPDF2 import PdfFileReader, PdfFileWriter
except ImportError:
from pyPdf import PdfFileReader, PdfFileWriter

def pdf_cat(input_files, output_stream):
input_streams = []
try:
# First open all the files, then produce the output file, and
# finally close the input files. This is necessary because
# the data isn't read from the input files until the write
# operation. Thanks to
# https://stackoverflow.com/questions/6773631/problem-with-closing-python-pypdf-writing-getting-a-valueerror-i-o-operation/6773733#6773733
for input_file in input_files:
input_streams.append(open(input_file, 'rb'))
writer = PdfFileWriter()
for reader in map(PdfFileReader, input_streams):
for n in range(reader.getNumPages()):
writer.addPage(reader.getPage(n))
writer.write(output_stream)
finally:
for f in input_streams:
f.close()
output_stream.close()

if __name__ == '__main__':
if sys.platform == "win32":
import os, msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
pdf_cat(sys.argv[1:], sys.stdout)

Combine / merge multiple HTML documents in r

It seems like the function html_combine of the package R3port expects the files to have a rawhtml extension. You can save rawhtml instead of html files like this:


library(tableHTML)
library(R3port)

x=data.frame(x=c(1,2,3))
y=data.frame(y=c(4,5,6))

tableHTML::write_tableHTML(tableHTML(x), "x.rawhtml")

tableHTML::write_tableHTML(tableHTML(y), "y.rawhtml")

And then use html_combine to get the output:

html_combine(
out = "to.html",
toctheme = TRUE,
css = paste0(system.file(package = "R3port"), "/style.css"),
clean = 0
)

The result is this:

screenshot of html report

Need to merge multiple pdf's into a single PDF with Table Of Contents sections

And what's the best tool for us to merge the pdf's?

On Linux (as well as on Windows), you can install an useful little program, pdftk. It works well to bind PDF's together. For example:

$ pdftk in1.pdf in2.pdf in3.pdf in4.pdf in5.pdf in6.pdf cat output out.pdf

where in*.pdf are the input files and out.pdf is the result. In between, @jerik already gave an answer how to deal with the TOC.



Related Topics



Leave a reply



Submit