Adding meta data when plotting to PDF
You may set the document's title by passing an appropriate title
arg to the pdf()
function. For other metadata, refer to this SO Q&A in which the usage of external tools (like pdftk
or exiftool
) is recommended, see this Q&A at AskUbuntu.
how to extract title from a pdf documment with R
We will need to make some assumptions about the structure of the pdf we wish to scrape. The code below makes the following assumptions:
- Title and abstract are on page 1 (fair assumption?)
- Title is of height 15
- The abstract is between the first occurrence of the word "Abstract" and first occurrence of the word "Introduction"
library(tidyverse)
library(pdftools)
data = pdf_data("~/Desktop/scrape.pdf")
#Get First page
page_1 = data[[1]]
# Get Title, here we assume its of size 15
title = page_1%>%
filter(height == 15)%>%
.$text%>%
paste0(collapse = " ")
#Get Abstract
abstract_start = which(page_1$text == "Abstract.")[1]
introduction_start = which(page_1$text == "Introduction")[1]
abstract = page_1$text[abstract_start:(introduction_start-2)]%>%
paste0(collapse = " ")
You can, of course, work off of this and impose stricter constraints for your scraper.
Adding metaData to existng pdf file
You can use: PdfStamper.setMoreInfo
:
final HashMap<String, String> info = new HashMap<>();
if (title != null) {
info.put("Title", title);
}
if (subject != null) {
info.put("Subject", subject);
}
if (keywords != null) {
info.put("Keywords", keywords);
}
if (creator != null) {
info.put("Creator", creator);
}
if (author != null) {
info.put("Author", author);
}
stamper.setMoreInfo(info);
How to add email under author in pandoc markdown to pdf?
Assuming that the default pandoc template for LaTeX is used for the conversion,
this worked for me:
---
title: My title
subtitle: My subtitle
date: \today
author: |
| My Name
| my.name@email.com
---
Related Topics
Knitr: Object Cannot Be Found When Converting Markdown File into HTML
Fastest Way to Read Large Excel Xlsx Files? to Parallelize or Not
Ggplot Piecharts on a Ggmap: Labels Destroy the Small Plots
Creating Igraph with Isolated Nodes
How to Reverse the Order of a Dataframe in R
Making Gsub Only Replace Entire Words
Cumulative Sum in a Window (Or Running Window Sum) Based on a Condition in R
How to Make Join Operations in Dplyr Silent
Error: Object '.Dosnowglobals' Not Found
Assign Point Color Depending on Data.Frame Column Value R
Tidyr Separate Only First N Instances
Keep All Plot Components Same Size in Ggplot2 Between Two Plots
Get(X) Does Not Work in R Data.Table When X Is Also a Column in the Data Table
Scatterplot: Error in Fun(X[[I]], ...):Object 'Group' Not Found
How to Color the Ocean Blue in a Map of the Us