R: Eval(Parse(...)) Is Often Suboptimal

R: eval(parse(...)) is often suboptimal

Actually the list probably looks a bit different. The '$' convention is somewhat misleading. Try this:

dat[["orders"]][[ or_ID ]][["price"]]

The '$' does not evaluate its arguments, but "[[" does, so or_ID will get turned into "5810584".

What specifically are the dangers of eval(parse(...))?

Most of the arguments against eval(parse(...)) arise not because of security concerns, after all, no claims are made about R being a safe interface to expose to the Internet, but rather because such code is generally doing things that can be accomplished using less obscure methods, i.e. methods that are both quicker and more human parse-able. The R language is supposed to be high-level, so the preference of the cognoscenti (and I do not consider myself in that group) is to see code that is both compact and expressive.

So the danger is that eval(parse(..)) is a backdoor method of getting around lack of knowledge and the hope in raising that barrier is that people will improve their use of the R language. The door remains open but the hope is for more expressive use of other features. Carl Witthoft's question earlier today illustrated not knowing that the get function was available, and the question he linked to exposed a lack of understanding of how the [[ function behaved (and how $ was more limited than [[). In both cases an eval(parse(..)) solution could be constructed, but it was clunkier and less clear than the alternative.

How do I avoid eval and parse?

For what its worth, the function source actually uses eval(parse(...)), albeit in a somewhat subtle way. First, .Internal(parse(...)) is used to create expressions, which after more processing are later passed to eval. So eval(parse(...)) seems to be good enough for the R core team in this instance.

That said, you don't need to jump through hoops to source functions into a new environment. source provides an argument local that can be used for precisely this.

local: TRUE, FALSE or an environment, determining where the parsed expressions are evaluated.

An example:

env = new.env()
source('test.r', local = env)

testing it works:

env$test('hello', 'world')
# [1] "hello world"
ls(pattern = 'test')
# character(0)

And an example test.r file to use this on:

test = function(a,b) paste(a,b)

Avoiding the infamous eval(parse()) construct

Using get and [[:

bar <- list(foo = list(fast = 1:5, slow = 6:10),
oof = list(6:10, 1:5))

rab <- 'bar'

get(rab)[['oof']]
# [[1]]
# [1] 6 7 8 9 10
#
# [[2]]
# [1] 1 2 3 4 5

Properly calling variable name when creating multiple Benford plots

This is happening because of the first line within the benford function=:

benford <- function(data, number.of.digits = 2, sign = "positive", discrete=TRUE, round=3){

data.name <- as.character(deparse(substitute(data)))

Source: https://github.com/cran/benford.analysis/blob/master/R/functions-new.R

data.name is then used to name your graph. Whatever variable name or expression you pass to the function will unfortunately be caught by the deparse(substitute()) call, and will be used as the name for your graph.


One short-term solution is to copy and rewrite the function:

#install.packages("benford.analysis")
library(benford.analysis)
#install.packages("data.table")
library(data.table) # needed for function

# load hidden functions into namespace - needed for function
r <- unclass(lsf.str(envir = asNamespace("benford.analysis"), all = T))
for(name in r) eval(parse(text=paste0(name, '<-benford.analysis:::', name)))

benford_rev <- function{} # see below

for (i in colnames(iris[1:4])){
plot(benford_rev(iris[[i]], data.name = i))
}

Sample Image

Sample Image

This has negative side effects of:

  • Not being maintainable with package revisions
  • Fills your GlobalEnv with normally hidden functions in the package

So hopefully someone can propose a better way!


benford_rev <- function(data, number.of.digits = 2, sign = "positive", discrete=TRUE, round=3, data.name = as.character(deparse(substitute(data)))){ # changed

# removed line

benford.digits <- generate.benford.digits(number.of.digits)

benford.dist <- generate.benford.distribution(benford.digits)

empirical.distribution <- generate.empirical.distribution(data, number.of.digits,sign, second.order = FALSE, benford.digits)

n <- length(empirical.distribution$data)

second.order <- generate.empirical.distribution(data, number.of.digits,sign, second.order = TRUE, benford.digits, discrete = discrete, round = round)

n.second.order <- length(second.order$data)

benford.dist.freq <- benford.dist*n

## calculating useful summaries and differences
difference <- empirical.distribution$dist.freq - benford.dist.freq

squared.diff <- ((empirical.distribution$dist.freq - benford.dist.freq)^2)/benford.dist.freq

absolute.diff <- abs(empirical.distribution$dist.freq - benford.dist.freq)

### chi-squared test
chisq.bfd <- chisq.test.bfd(squared.diff, data.name)

### MAD
mean.abs.dev <- sum(abs(empirical.distribution$dist - benford.dist)/(length(benford.dist)))

if (number.of.digits > 3) {
MAD.conformity <- NA
} else {
digits.used <- c("First Digit", "First-Two Digits", "First-Three Digits")[number.of.digits]
MAD.conformity <- MAD.conformity(MAD = mean.abs.dev, digits.used)$conformity
}

### Summation
summation <- generate.summation(benford.digits,empirical.distribution$data, empirical.distribution$data.digits)
abs.excess.summation <- abs(summation - mean(summation))

### Mantissa
mantissa <- extract.mantissa(empirical.distribution$data)
mean.mantissa <- mean(mantissa)
var.mantissa <- var(mantissa)
ek.mantissa <- excess.kurtosis(mantissa)
sk.mantissa <- skewness(mantissa)

### Mantissa Arc Test
mat.bfd <- mantissa.arc.test(mantissa, data.name)

### Distortion Factor
distortion.factor <- DF(empirical.distribution$data)

## recovering the lines of the numbers
if (sign == "positive") lines <- which(data > 0 & !is.na(data))
if (sign == "negative") lines <- which(data < 0 & !is.na(data))
if (sign == "both") lines <- which(data != 0 & !is.na(data))
#lines <- which(data %in% empirical.distribution$data)

## output
output <- list(info = list(data.name = data.name,
n = n,
n.second.order = n.second.order,
number.of.digits = number.of.digits),

data = data.table(lines.used = lines,
data.used = empirical.distribution$data,
data.mantissa = mantissa,
data.digits = empirical.distribution$data.digits),

s.o.data = data.table(second.order = second.order$data,
data.second.order.digits = second.order$data.digits),

bfd = data.table(digits = benford.digits,
data.dist = empirical.distribution$dist,
data.second.order.dist = second.order$dist,
benford.dist = benford.dist,
data.second.order.dist.freq = second.order$dist.freq,
data.dist.freq = empirical.distribution$dist.freq,
benford.dist.freq = benford.dist.freq,
benford.so.dist.freq = benford.dist*n.second.order,
data.summation = summation,
abs.excess.summation = abs.excess.summation,
difference = difference,
squared.diff = squared.diff,
absolute.diff = absolute.diff),

mantissa = data.table(statistic = c("Mean Mantissa",
"Var Mantissa",
"Ex. Kurtosis Mantissa",
"Skewness Mantissa"),
values = c(mean.mantissa = mean.mantissa,
var.mantissa = var.mantissa,
ek.mantissa = ek.mantissa,
sk.mantissa = sk.mantissa)),
MAD = mean.abs.dev,

MAD.conformity = MAD.conformity,

distortion.factor = distortion.factor,

stats = list(chisq = chisq.bfd,
mantissa.arc.test = mat.bfd)
)

class(output) <- "Benford"

return(output)

}

Converting a character type to a logical

You can use eval(parse(...)).

a <- 3
x <- "a > 2"

eval(parse(text=x))
[1] TRUE

x2 <- "a==3"
eval(parse(text=x))
[1] TRUE

Here be dragons.

See, for example:

  • R: eval(parse(...)) is often suboptimal
  • Evaluate expression given as a string
  • Assigning and removing objects in a loop: eval(parse(paste(
  • R: eval(parse()) error message: cannot open file even though "text=" is specified in parse


Related Topics



Leave a reply



Submit