How to Add "Author" Metadata to a PDF Created from R

Adding meta data when plotting to PDF

You may set the document's title by passing an appropriate title arg to the pdf() function. For other metadata, refer to this SO Q&A in which the usage of external tools (like pdftk or exiftool) is recommended, see this Q&A at AskUbuntu.

how to extract title from a pdf documment with R

We will need to make some assumptions about the structure of the pdf we wish to scrape. The code below makes the following assumptions:

  1. Title and abstract are on page 1 (fair assumption?)
  2. Title is of height 15
  3. The abstract is between the first occurrence of the word "Abstract" and first occurrence of the word "Introduction"
library(tidyverse)
library(pdftools)

data = pdf_data("~/Desktop/scrape.pdf")

#Get First page
page_1 = data[[1]]

# Get Title, here we assume its of size 15
title = page_1%>%
filter(height == 15)%>%
.$text%>%
paste0(collapse = " ")

#Get Abstract
abstract_start = which(page_1$text == "Abstract.")[1]
introduction_start = which(page_1$text == "Introduction")[1]

abstract = page_1$text[abstract_start:(introduction_start-2)]%>%
paste0(collapse = " ")

You can, of course, work off of this and impose stricter constraints for your scraper.

Adding metaData to existng pdf file

You can use: PdfStamper.setMoreInfo:

final HashMap<String, String> info = new HashMap<>();
if (title != null) {
info.put("Title", title);
}
if (subject != null) {
info.put("Subject", subject);
}
if (keywords != null) {
info.put("Keywords", keywords);
}
if (creator != null) {
info.put("Creator", creator);
}
if (author != null) {
info.put("Author", author);
}

stamper.setMoreInfo(info);

How to add email under author in pandoc markdown to pdf?

Assuming that the default pandoc template for LaTeX is used for the conversion,
this worked for me:

---
title: My title
subtitle: My subtitle
date: \today

author: |
| My Name
| my.name@email.com
---


Related Topics



Leave a reply



Submit