Combine (bind) existing pdf files in R
Here's how to do it with a minimal reproducible example. I believe you'll be able to pick it apart and figure out how to apply to your pdfs. The reports package isn't necessary but I like the use of folder
and delete
in my workflow so I used it here:
library(plotflow)
library(reports)
## make a folder to store the pdfs
folder(deleteMe)
## create a bunch of various sized pdfs
lapply(1:3, function(i) {
pdf(sprintf("deleteMe/test%s.pdf", i), width=sample(4:7, 1))
plot(1:10, 1:10, main = sprintf("Title: test%s.pdf", i))
dev.off()
})
## paste the paths to pdfs together in one string w/ spaces
plotflow:::mergePDF(
in.file=paste(file.path("deleteMe", dir("deleteMe")), collapse=" "),
file="merged.pdf"
)
## delete MWE
delete('deleteMe')
This was a helper function for within plotflow to aide work within R. I'd likely use gohstscript directly myself if I had the pdfs already.
R merge pdf into one pdf
Install pdftk and ensure it is on your path (or if not on your path use the full pathname when referring to it in the system
command below). Then run the code below. No packages are used.
setwd("...directory where pdf files are located...") ##
infiles <- Sys.glob("148-*.pdf") ##
outfile <- "148.pdf" ##
system(paste("pdftk", paste(infiles, collapse = " "), "cat output", outfile))
There are some packages that provide wrappers around pdftk:
the staplr package. Unfortunately, it is probably not too useful here because it does not allow specification of the files or their order in the output -- one can only specify the input and output directories. ** Update ** The current version of staplr now allows specification of the files as mentioned in the comments.
the animation package. The
pdftk
command that this package provides is a slightly simpler alternative than usingsystem
withpdftk
directly. For concatenating it would be the following where we assume that the 3 lines marked ## above have already been run.library(animation)
ani.options(pdftk = "/path/to/pdftk") # or if on path: ani.options(pdftk = "pdftk")
pdftk(infiles, "cat", outfile, "")This link has an example of using the animation package with pdftk to burst pages.
combine multiple pdf plots into one file
You can use sweave/knitr
to get more flexibility and merge easily new plots ,old ones and texts:
\documentclass{article}
\usepackage{pdfpages}
\begin{document}
this my plot 1: % write some texts here
\includepdf{1.pdf}
this my plot 2:
\includepdf{2.pdf}
this my plot 3:
\includepdf{3.pdf}
this my plot 4:
\includepdf{4.pdf}
a new plot:
<<echo=FALSE>>= % chunk for new plots
x <- rnorm(100)
hist(x)
@
\end{document}
Combine multiple pages of .pdf into RMarkdown
I'm able to successfully include two different multi-page PDFs in your example document using pdfpages
:
---
title: <center> <h1>Analysis Data</h1> </center>
mainfont: Arial
output:
pdf_document:
latex_engine: xelatex
sansfont: Arial
fig_crop: false
toc: true
classoption: landscape
fontsize: {10}
geometry: margin=0.30in
header-includes:
- \usepackage{booktabs}
- \usepackage{sectsty} \sectionfont{\centering}
- \renewcommand{\contentsname}{}\vspace{-2cm}
- \usepackage{pdfpages}
---
# File One
\includepdf[pages={-}]{pdf1.pdf}
\newpage
# File Two
\includepdf[pages={-}]{pdf2.pdf}
Merge PDF files
Use Pypdf or its successor PyPDF2:
A Pure-Python library built as a PDF toolkit. It is capable of:
- splitting documents page by page,
- merging documents page by page,
(and much more)
Here's a sample program that works with both versions.
#!/usr/bin/env python
import sys
try:
from PyPDF2 import PdfFileReader, PdfFileWriter
except ImportError:
from pyPdf import PdfFileReader, PdfFileWriter
def pdf_cat(input_files, output_stream):
input_streams = []
try:
# First open all the files, then produce the output file, and
# finally close the input files. This is necessary because
# the data isn't read from the input files until the write
# operation. Thanks to
# https://stackoverflow.com/questions/6773631/problem-with-closing-python-pypdf-writing-getting-a-valueerror-i-o-operation/6773733#6773733
for input_file in input_files:
input_streams.append(open(input_file, 'rb'))
writer = PdfFileWriter()
for reader in map(PdfFileReader, input_streams):
for n in range(reader.getNumPages()):
writer.addPage(reader.getPage(n))
writer.write(output_stream)
finally:
for f in input_streams:
f.close()
output_stream.close()
if __name__ == '__main__':
if sys.platform == "win32":
import os, msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
pdf_cat(sys.argv[1:], sys.stdout)
Combine / merge multiple HTML documents in r
It seems like the function html_combine
of the package R3port
expects the files to have a rawhtml
extension. You can save rawhtml
instead of html
files like this:
library(tableHTML)
library(R3port)
x=data.frame(x=c(1,2,3))
y=data.frame(y=c(4,5,6))
tableHTML::write_tableHTML(tableHTML(x), "x.rawhtml")
tableHTML::write_tableHTML(tableHTML(y), "y.rawhtml")
And then use html_combine
to get the output:
html_combine(
out = "to.html",
toctheme = TRUE,
css = paste0(system.file(package = "R3port"), "/style.css"),
clean = 0
)
The result is this:
Need to merge multiple pdf's into a single PDF with Table Of Contents sections
And what's the best tool for us to merge the pdf's?
On Linux (as well as on Windows), you can install an useful little program, pdftk
. It works well to bind PDF's together. For example:
$ pdftk in1.pdf in2.pdf in3.pdf in4.pdf in5.pdf in6.pdf cat output out.pdf
where in*.pdf
are the input files and out.pdf
is the result. In between, @jerik already gave an answer how to deal with the TOC.
Related Topics
How to Fix Degree Symbol Not Showing Correctly in R on Linux/Fedora 31
How to Add Multiple Columns to a Tibble
Shiny Datatable in Landscape Orientation
Control The Fill Order and Groups for a Ggplot2 Geom_Bar
How to Define Multiple Variables with Lapply
Creating a Table with Individual Trials from a Frequency Table in R (Inverse of Table Function)
Can't Install Any R Packages on Linux Server
Find Specific Patterns in Sequences
How to Use Stat_Function by Group
How to Create a Rank Variable Under Certain Conditions
How to Remove Trailing Zeros in R Dataframe
Label_Parsed of Facet_Grid in Ggplot2 Mixed with Spaces and Expressions
How to Add Geo-Spatial Connections on a Ggplot Map
How to Efficiently Retrieve Top K-Similar Vectors by Cosine Similarity Using R