Unicode with Knitr and Rmarkdown

Unicode with knitr and Rmarkdown

It looks like an encoding issue specific to Windows, and may be related to this issue: https://github.com/hadley/evaluate/issues/59 Unfortunately we have to wait for a fix in base R, but if you don't have to use cat(), and this expression is a top-level expression in your code chunk (e.g. not inside a for-loop or if-statement), I guess this may work:

knitr::asis_output("\U2660   \U2665  \U2666  \U2663")

It passes the character string directly to knitr and bypasses cat(), since knitr cannot reliably catch multibyte characters written out by cat() on Windows -- it depends on whether the characters can be represented by your system's native encoding.

Differences in Unicode character output with print()

So, apparently this character conversion issue is unlikely to resolve itself in the near future, and will probably only be solved at the OS level. But based on the excellent suggestions made by @YihuiXie in the comments, there are two ways this issue can be worked around. The best solution will depend on the context that your are creating the tables in.

Scenario 1: Tables Only

If the only type of object you need to output from inside your for-loop are tables, then you can accumulate the kable objects in a list inside the loop, then collapse the list of kables into a single character vector at the conclusion of the loop, and display it using knitr::asis_output.

```{r, results="asis"}
library(knitr)
character_list <- list(eta="\U03B7", sigma="\U03C3")
kable_list <- vector(mode="list", length = length(character_list))

for (i in 1:length(character_list)) {
kable_list[[i]] <- knitr::kable(as.data.frame(character_list[i]),
format="html"
)
}

knitr::asis_output(paste(kable_list, collapse = '\n'))
```

This produces the following tables in the HTML document:
Sample Image

Scenario 2: Tables and other objects (e.g. Plots)

If you're outputting both tables and other objects (e.g., plots) on each iteration of your for-loop, then the above solution wont work - you can't coerce your plots to a character vector! At this point, we have to result to some post-processing of the kable output by writing a customized knitr output hook.

The basic approach will be to replace the busted sequences in the table cells with the equivalent HTML entities. Note that because the table is created in an results="asis" chunk, we have to override the chunk level output hook, not the output level output hook (confusing, I know).

```{r hook_override}
library(knitr)
default_hook <- knit_hooks$get("chunk")

knit_hooks$set(chunk = function(x, options) {
# only attempt substitution if output is a character vector, which I *think* it always should be
if (is.character(x)) {
# Match the <U+XXXX> pattern in the output
match_data <- gregexpr("<U\\+[0-9A-F]{4,8}>", x)
# If a match is found, proceed with HTML entity substitution
if (length(match_data[[1]]) >= 1 && match_data[[1]][1] != -1) {
# Extract the matched strings from the output
match_strings <- unlist(regmatches(x, match_data))
# Extract the hexadecimal Unicode sequences from inside the <U > bracketing
code_sequences <- unlist(regmatches(match_strings,
gregexpr("[0-9A-F]{4,8}", match_strings)
)
)
# Replace any leading zero's with x, which is required for the HTML entities
code_sequences <- gsub("^0{1,4}", "x", code_sequences)
# Slap the &# on the front, and the ; on the end of each code sequence
regmatches(x, match_data) <- list(paste0("&#", code_sequences, ";"))
}
}
# "Print" the output
default_hook(x, options)
})
```

```{r tables, results="asis"}
character_list <- list(eta="\U03B7", sigma="\U03C3")
for (i in 1:length(character_list)) {
x <- knitr::kable(as.data.frame(character_list[i]),
format="html"
)
print(x)
}
```

```{r hook_reset}
knit_hooks$set(chunk = default_hook)
```

This produces the following tables in the HTML document:

Sample Image

Note that this time, the sigma doesn't display as σ like it did in the first example, it displays as s! This is because the sigma gets converted to an s before it gets to the chunk output hook! I have no idea how to stop that from happening. Feel free to leave a comment if you do =)

I also realize that using regular expressions to do the substitutions within the HTML table is probably fragile. If this approach happens to fail for your use case, perhaps using the rvest package to parse out each table cell individually would be more robust.

Include unicode in ggplot in .rmd, render in multiple formats

It's (likely) a graphics device issue. You can set a different device using chunk options. If you have the {ragg} package, this usually gives me reliable results.

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,
dev = "ragg_png")
library(ggplot2)
library(dplyr)
```

You can also add dpi = 150 to the options if you find that the graphics are too coarse.

Unicode characters not recognised by LaTex in RMarkdown Pander table

I think it's an issue with your locale/console settings and not really a pander issue, as this seems to work fine in a console with support for Unicode chars:

pander with unicode chars

But pdflatex indeed sucks with Unicode chars, you might better try eg xelatex.

PS: sorry for posting this comment as an answer, but this was the easiest way to add an image



Related Topics



Leave a reply



Submit