Importing Common Yaml in Rstudio/Knitr Document

Importing common YAML in rstudio/knitr document

Have found two options to do this portably (ie no .Rprofile customisation needed, minimal duplication of YAML frontmatter):

  1. You can provide common yaml to pandoc on the command-line! d'oh!
  2. You can set the knit: property of the metadata to your own function to have greater control over what happens when you Ctrl+Shift+K.

Option 1: common YAML to command line.

Put all the common YAML in its own file

common.yaml:

---
author: me
date: "`r format (Sys.time(), format='%Y-%m-%d %H:%M:%S %z')`"
link-citations: true
reference-section-title: References
---

Note it's complete, ie the --- are needed.

Then in the document you can specify the YAML as the last argument to pandoc, and it'll apply the YAML (see this github issue)

in example.rmd:

---
title: On the Culinary Preferences of Anthropomorphic Cats
output:
html_document:
pandoc_args: './common.yaml'
---

I do not like green eggs and ham. I do not like them, Sam I Am!

You could even put the html_document: stuff in an _output.yaml since rmarkdown will take that and place it under output: for all the documents in that folder. In this way there can be no duplication of YAML between all documents using this frontmatter.

Pros:

  • no duplication of YAML frontmatter.
  • very clean

Cons:

  • the common YAML is not passed through knit, so the date field above will not be parsed. You will get the literal string "r format(Sys.time(), format='%Y-%m-%d %H:%M:%S %z')" as your date.
  • from the same github issue:

    Metadata definitions seen first are kept and left unchanged, even if conflicting data is parsed at a later point.

Perhaps this could be a problem at some point depending on your setup.

Option 2: override the knit command

This allows for much greater control, though is a bit more cumbersome/tricky.

This link and this one mention an undocumented feature in rmarkdown: the knit: part of the YAML will be executed when one clicks the "Knit" button of Rstudio.

In short:

  1. define a function myknit(inputFile, encoding) that would read the YAML, put it in to the RMD and call render on the result. Saved in its own file myknit.r.
  2. in the YAML of example.rmd, add

     knit:  (function (...) { source('myknit.r'); myknit(...) })

    It seems to have to be on one line. The reason for source('myknit.r') instead of just putting the function definition int he YAML is for portability. If I modify myknit.r I don't have to modify every document's YAML. This way, the only common YAML that all documents must repeat in their frontmatter is the knit line; all other common YAML can stay in common.yaml.

Then Ctrl+Shift+K works as I would hope from within Rstudio.

Further notes:

  • myknit could just be a system call to make if I had a makefile setup.
  • the injected YAML will be passed through rmarkdown and hence knitted, since it is injected before the call to render.
  • Preview window: so long as myknit produces a (single) message Output created: path/to/file.html, then the file will be shown in the preview window.

    I have found that there can be only one such message in the output [not multiple], or you get no preview window. So if you use render (which makes an "Output created: basename.extension") message and the final produced file is actually elsewhere, you will need to suppress this message via either render(..., quiet=T) or suppressMessages(render(...)) (the former suppresses knitr progress and pandoc output too), and create your own message with the correct path.

Pros:

  • the YAML frontmatter is knitted
  • much more control than option 1 if you need to do custom pre- / post-processing.

Cons:

  • a bit more effort than option 1
  • the knit: line must be duplicated in each document (though by source('./myknit.r') at least the function definition may be stored in one central location)

Here is the setup for posterity. For portability, you only need to carry around myknit.r and common.yaml. No .Rprofile or project-specific config needed.

example.rmd:

---
title: On the Culinary Preferences of Anthropomorphic Cats
knit: (function (...) { source('myknit.r'); myknit(...) })
---

I do not like green eggs and ham. I do not like them, Sam I Am!

common.yaml [for example]:

author: me
date: "`r format (Sys.time(), format='%Y-%m-%d %H:%M:%S %z')`"
link-citations: true
reference-section-title: References

myknit.r:

myknit <- function (inputFile, encoding, yaml='common.yaml') {   
# read in the YAML + src file
yaml <- readLines(yaml)
rmd <- readLines(inputFile)

# insert the YAML in after the first ---
# I'm assuming all my RMDs have properly-formed YAML and that the first
# occurence of --- starts the YAML. You could do proper validation if you wanted.
yamlHeader <- grep('^---$', rmd)[1]
# put the yaml in
rmd <- append(rmd, yaml, after=yamlHeader)

# write out to a temp file
ofile <- file.path(tempdir(), basename(inputFile))
writeLines(rmd, ofile)

# render with rmarkdown.
message(ofile)
ofile <- rmarkdown::render(ofile, encoding=encoding, envir=new.env())

# copy back to the current directory.
file.copy(ofile, file.path(dirname(inputFile), basename(ofile)), overwrite=T)
}

Pressing Ctrl+Shift+K/Knit from the editor of example.rmd will compile the result and show a preview. I know it is using common.yaml, because the result includes the date and author whereas example.rmd on its own does not have a date or author.

Programmatically add tags to yaml header during knitting R markdown file

To generate a valid YAML array, you could use the alternative syntax [ ], e.g.,

tags: ["`r paste(head(letters), collapse = '", "')`"]

which generates:

tags: ["a", "b", "c", "d", "e", "f"]

Note the hack collapse = '", "': since there already exists a pair of double quotes outside the R expression, you should only generate the part a", "b", "c", "d", "e", "f from the R expression.

-- solution copied from Yihui's explanation at blogdown#647

how to access yaml metadata from knitr

That is stored in rmarkdown::metadata as a list of the form list(title = ...).

Strip YAML from child docs in knitr

In the mean time, maybe the following will work for you; it is kind of an ugly and inefficient work-around (I am new to knitr and am not a real programmer), but it achieves what I believe you are wanting to do.

I had written a function for a similar personal use that includes the following relevant bit; the original is in Spanish, so I've translated it some below:

extraction <- function(matter, escape = FALSE, ruta = ".", patron) {

require(yaml)

# Gather together directory of documents to be processed

doc_list <- list.files(
path = ruta,
pattern = patron,
full.names = TRUE
)

# Extract desired contents

lapply(
X = doc_list,
FUN = function(i) {
raw_contents <- readLines(con = i, encoding = "UTF-8")

switch(
EXPR = matter,

# !YAML (e.g., HTML)

"no_yaml" = {

if (escape == FALSE) {

paste(raw_contents, sep = "", collapse = "\n")

} else if (escape == TRUE) {

require(XML)
to_be_escaped <- paste(raw_contents, sep = "", collapse = "\n")
xmlTextNode(value = to_be_escaped)

}

},

# YAML header and Rmd contents

"rmd" = {
yaml_pattern <- "[-]{3}|[.]{3}"
limits_yaml <- grep(pattern = yaml_pattern, x = raw_contents)[1:2]
indices_yaml <- seq(
from = limits_yaml[1] + 1,
to = limits_yaml[2] - 1
)
yaml <- mapply(
FUN = function(i) {yaml.load(string = i)},
raw_contents[indices_yaml],
USE.NAMES = FALSE
)
indices_rmd <- seq(
from = limits_yaml[2] + 1,
to = length(x = raw_contents)
)
rmd<- paste(raw_contents[indices_rmd], sep = "", collapse = "\n")
c(yaml, "contents" = rmd)
},

# Anything else (just in case)

{
stop("Matter not extractable")
}

)

}
)

}

Say my main Rmd document main.Rmd lives in my_directory and my child documents, 01-abstract.Rmd, 02-intro.Rmd, ..., 06-conclusion.Rmd are housed in ./sections; note that for my amateur function it is best to have the child documents saved in the order they will be summoned into the main document (see below). I have my function extraction.R in ./assets. Here is the structure of my example directory:

.
+--assets
| +--extraction.R
+--sections
| +--01-abstract.Rmd
| +--02-intro.Rmd
| +--03-methods.Rmd
| +--04-results.Rmd
| +--05-discussion.Rmd
| +--06-conclusion.Rmd
+--stats
| +--analysis.R
+--main.Rmd

In main.Rmd I import my child documents from ./sections:

---
title: Main
author: me
date: Today
output:
html_document
---

```{r, 'setup', include = FALSE}
opts_chunk$set(autodep = TRUE)
dep_auto()
```

```{r, 'import_children', cache = TRUE, include = FALSE}
source('./assets/extraction.R')
rmd <- extraction(
matter = 'rmd',
ruta = './sections',
patron = "*.Rmd"
)
```

# Abstract

```{r, 'abstract', echo = FALSE, results = 'asis'}
cat(x = rmd[[1]][["contents"]], sep = "\n")
```

# Introduction

```{r, 'intro', echo = FALSE, results = 'asis'}
cat(x = rmd[[2]][["contents"]], sep = "\n")
```

# Methods

```{r, 'methods', echo = FALSE, results = 'asis'}
cat(x = rmd[[3]][["contents"]], sep = "\n")
```

# Results

```{r, 'results', echo = FALSE, results = 'asis'}
cat(x = rmd[[4]][["contents"]], sep = "\n")
```

# Discussion

```{r, 'discussion', echo = FALSE, results = 'asis'}
cat(x = rmd[[5]][["contents"]], sep = "\n")
```

# Conclusion

```{r, 'conclusion', echo = FALSE, results = 'asis'}
cat(x = rmd[[6]][["contents"]], sep = "\n")
```

# References

I then knit this document and only the contents of my child documents are incorporated thereinto, e.g.:

---
title: Main
author: me
date: Today
output:
html_document
---

# Abstract

This is **Child Doc 1**, my abstract.

# Introduction

This is **Child Doc 2**, my introduction.

- Point 1
- Point 2
- Point *n*

# Methods

This is **Child Doc 3**, my "Methods" section.

| method 1 | method 2 | method *n* |
|---------------|---------------|----------------|
| fffffffffffff | fffffffffffff | fffffffffffff d|
| fffffffffffff | fffffffffffff | fffffffffffff d|
| fffffffffffff | fffffffffffff | fffffffffffff d|

# Results

This is **Child Doc 4**, my "Results" section.

## Result 1

```{r}
library(knitr)
```

```{r, 'analysis', cache = FALSE}
source(file = '../stats/analysis.R')
```

# Discussion

This is **Child Doc 5**, where the results are discussed.

# Conclusion

This is **Child Doc 6**, where I state my conclusions.

# References

The foregoing document is the knitted version of main.Rmd, i.e., main.md. Note under ## Result 1 that in my child document, 04-results.Rmd, I sourced an external R script, ./stats/analysis.R, which is now incorporated as a new knitr chunk in my knitted document; consequently, I now need to knit the document again.

When child documents also include chunks, instead of knitting into .md I would knit the main document into another .Rmd as many times as I have chunks nested, e.g., continuing the example above:

  1. Using knit(input = './main.Rmd', output = './main_2.Rmd'), instead of knitting main.Rmd into main.md, I would knit it into another .Rmd so as to be able to knit the resulting file containing the newly imported chunks, e.g., my R script analysis.R above.
  2. I can now knit my main_2.Rmd into main.md or render it as main.html via rmarkdown::render(input = './main_2.Rmd', output_file = './main.html').

Note: in the example above of main.md, the path to my R script is ../stats/analysis.R. This is the path relative to the child document that sourced it, ./sections/04-results.Rmd. Once I import the child document into the main document located at the root of my_directory, i.e., ./main.md or ./main_2.Rmd, the path becomes wrong; I therefore must correct it manually to ./stats/analysis.R before the next knit.

I mentioned above that it is best to have the child documents saved in the same order that they are imported into the main document. This is because my simple function extraction() simply stores the contents of all the files specified to it in an unnamed list, hence I must access each file in main.Rmd by number, i.e., rmd[[5]][["contents"]] refers to the child document ./sections/05-discussion.Rmd; consider:

> str(rmd)
List of 6
$ :List of 4
..$ title : chr "child doc 1"
..$ layout : chr "default"
..$ etc : chr "etc"
..$ contents: chr "\nThis is **Child Doc 1**, my abstract."
$ :List of 4
..$ title : chr "child doc 2"
..$ layout : chr "default"
..$ etc : chr "etc"
..$ contents: chr "\nThis is **Child Doc 2**, my introduction.\n\n- Point 1\n- Point 2\n- Point *n*"
$ :List of 4
..$ title : chr "child doc 3"
..$ layout : chr "default"
..$ etc : chr "etc"
..$ contents: chr "\nThis is **Child Doc 3**, my \"Methods\" section.\n\n| method 1 | method 2 | method *n* |\n|--------------|--------------|----"| __truncated__
$ :List of 4
..$ title : chr "child doc 4"
..$ layout : chr "default"
..$ etc : chr "etc"
..$ contents: chr "\nThis is **Child Doc 4**, my \"Results\" section.\n\n## Result 1\n\n```{r}\nlibrary(knitr)\n```\n\n```{r, cache = FALSE}\nsour"| __truncated__
$ :List of 4
..$ title : chr "child doc 5"
..$ layout : chr "default"
..$ etc : chr "etc"
..$ contents: chr "\nThis is **Child Doc 5**, where the results are discussed."
$ :List of 4
..$ title : chr "child doc 6"
..$ layout : chr "default"
..$ etc : chr "etc"
..$ contents: chr "\nThis is **Child Doc 6**, where I state my conclusions."

So, extraction() here is actually storing both the R Markdown contents of the specified child documents, as well as their YAML, in case you had a use for this as well (I myself do).

How can I modify yaml instructions outside of the document I am rendering

You could cat a sink into a tempfile.

xxx <- "
#' # Title
Hello world

#+ one_plus_one
1 + 1
"

tmp <- tempfile()
sink(tmp)
cat("
---
title: 'Sample Document'
output:
html_document:
toc: true
theme: united
pdf_document:
toc: true
highlight: zenburn
---", xxx)
sink()
w.d <- getwd()
rmarkdown::render(tmp, output_file=paste(w.d, "myfile", sep="/"))


Related Topics



Leave a reply



Submit