R Bnlearn Eval Inside Function

R bnlearn eval inside function

Here is a solution to the problem:

library(bnlearn)
data(learning.test)
fitted = bn.fit(hc(learning.test), learning.test)

myfuncBN=function() {
  vars = names(learning.test)
  obs = 2
  str1 = paste("(", vars[-3], "=='",
          sapply(learning.test[obs,-3], as.character), "')",
          sep = "", collapse = " & ")
  str2 = paste("(", vars[3], "=='",
           as.character(learning.test[obs, 3]), "')", sep = "")

  eval(parse(text=paste("cpquery(fitted,",str2,",",str1,")")))
}

set.seed(1)
myfuncBN()

# [1] 0.05940594

This value is equal to the result given by:

set.seed(1)
cpquery(fitted, event=(C=="c"), 
             evidence=((A=="b") & (B=="a") & (D=="a") & (E=="b") & (F=="b")))

# [1] 0.05940594

Problem with Bayesian Network with bnlearn cpquery in shiny server - supplying evidence

The solution, many thanks to user20650, is to use renderText around the whole calculation. Works beautifully.

library(shiny)
library(bnlearn)

tdag = bn.fit(hc(learning.test[5:6]), learning.test[5:6])

shinyApp(

ui = basicPage(
selectInput("e", "E:", choices=letters[1:3] ),
selectInput("f", "F:", choices=letters[1:2] ),
textOutput("prob")
),

server = function(input, output, session) {
output$prob <- renderText({
event <- paste0("(F == '", input$f, "')")
evidence <- paste0("(E == '", input$e, "')")
eval(parse(text=paste(
'cpquery(fitted=tdag,
event = ', event, ',
evidence = ', evidence, ',
n = 100000,
debug = TRUE)'
)))})}
)

Using cpquery function for several pairs from dataset

cpquery is quite difficult to work with programmatically. If you look at the examples in the help page you can see the author uses eval(parse(...)) to build the queries. I have added two approaches below, one using the methods from the help page and one using cpdist to draw samples and reweighting to get the probabilities.

Your example

library(bnlearn); library(dplyr)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)

This uses cpquery and the much maligned eval(parse(...)) -- this is the
approach the the bnlearn author takes to do this programmatically in the ?cpquery examples. Anyway,

# You want the evidence and event to be the same; in your question it is `1`
# but for example using learning.test data we use 'a'
state = "\'a\'" # note if the states are character then these need to be quoted
event = paste(sttbl$from, "==", state)
evidence = paste(sttbl$to, "==", state)

# loop through using code similar to that found in `cpquery`
set.seed(1) # to make sampling reproducible
for(i in 1:nrow(sttbl)) {
  qtxt = paste("cpquery(fit, ", event[i], ", ", evidence[i], ",n=1e6", ")")
  sttbl$prob[i] = eval(parse(text=qtxt))
}

I find it preferable to work with cpdist which is used to generate random samples conditional on some evidence. You can then use these samples to build up queries. If you use likelihood weighting (method="lw") it is slightly easier to do this programatically (and without evil(parse(...))).
The evidence is added in a named list i.e. list(A='a').

# The following just gives a quick way to assign the same
# evidence state to all the evidence nodes.  
evidence = setNames(replicate(nrow(sttbl), "a", simplify = FALSE), sttbl$to)

# Now loop though the queries
# As we are using likelihood weighting we need to reweight to get the probabilities
# (cpquery does this under the hood)
# Also note with this method that you could simulate from more than
# one variable (event) at a time if the evidence was the same.
for(i in 1:nrow(sttbl)) {
  temp = cpdist(fit, sttbl$from[i], evidence[i], method="lw")
  w = attr(temp, "weights")  
  sttbl$prob2[i] = sum(w[temp=='a'])/ sum(w)
}

sttbl
#   from to   strength      prob     prob2
# 1    A  D -1938.9499 0.6186238 0.6233387
# 2    A  B -1153.8796 0.6050552 0.6133448
# 3    C  D  -823.7605 0.7027782 0.7067417
# 4    B  E  -720.8266 0.7332107 0.7328657
# 5    F  E  -549.2300 0.5850828 0.5895373

Understanding the event and evidence parameters in bnlearns cpquery

I think that this section is just trying to show what the evidence and event expressions represent in the original data and how they relate to logic sampling. e.g. if you subset the original data with the event/evidence, then it will subset the data to only contains rows that agree with the event/evidence.

But you are correct that the notation data[evidence, ] won't work. You can uselearning.test[eval(expression(B == "b"), learning.test), ], which may help you understand the intent. This is similar to what used to appear in older help files.

For logic sampling (the default inference method) a bunch of samples are generated from the BN, the default is N=10000 samples (*). So 1) you have estimated a parameterised BN, 2) draw N samples from it, 3) subset this sample according to event / evidence vectors to calculate some conditional probability (hopefully correctness is not lost in my brevity).

cpquery(fitted, (learning.test$B == "b"), (learning.test$A == "a")) doesn't work as the default number of samples in the sampled BN (step 2. above) is 10000 but learning.test has 5000 rows (and so the logical vector produced by (learning.test$B == "b") has length 5000). You can force your approach to work, (by work I mean produce an answer), by setting the n parameter , cpquery(fitted, (learning.test$B == "b"), (learning.test$A == "a"), n=5000) but this will not always do what you expect: you are evaluating the event/evidence expression against the observed data instead of the simulated data. I think that if this sort of logical vector is supplied then cpquery should probably throw an error. Best to stick with the documented approaches.

(*) From the help I'd expect the default number of samples to be 5000 * log10(nparams(fitted))) =~ 8000, but perhaps batch = 10000 takes precedence.

parallelization of bnlearn (with parallel package)

This is a bug. Please report it to the package maintainer.

Here is the code of check.cluster:

function (cluster) 
{
    if (is.null(cluster)) 
        return(TRUE)
    if (any(class(cluster) %!in% supported.clusters)) 
        stop("cluster is not a valid cluster object.")
    if (!requireNamespace("parallel")) 
        stop("this function requires the parallel package.")
    if (!isClusterRunning(cluster)) 
        stop("the cluster is stopped.")
}

Now, if you look at the class of cl:

class(cl)
#[1] "SOCKcluster" "cluster"

Let's reproduce the check:

bnlearn:::supported.clusters
#[1] "MPIcluster"  "PVMcluster"  "SOCKcluster"

`%!in%` <- function (x, table) {
  match(x, table, nomatch = 0L) == 0L
}
any(class(cl) %!in% bnlearn:::supported.clusters)
#[1] TRUE

cluster is not in supported.clusters. I believe, the function should only check if the cluster has a supported class and not if it has an unsupported class.

As a work-around you could change supported.clusters:

assignInNamespace("supported.clusters", 
                  c("cluster", "MPIcluster",  
                    "PVMcluster", "SOCKcluster"), 
                  "bnlearn")

How bnlearn calculates BIC of continuous data?

The BIC is calculated as

BIC = -2* logLik + nparams* log(nobs)

but in bnlearn this is rescaled by -2 (see?score) to give

BIC = logLik -0.5* nparams* log(nobs)

So for your example, with no edges the likelihood is calculated using the marginal means, and errors (or more generally, for each node the number of parameters is given by summing 1 (intercept) + 1 (residual error) + the number of parents), e.g.

library(bnlearn)
X = iris[, 1:3]
names(X) = c("A", "B", "C")
Network = empty.graph(names(X))

(ll = sum(sapply(X, function(i) dnorm(i, mean(i), sd(i), log=TRUE)))) 
#[1] -569.408
(penalty = 0.5* log(nrow(X))* 6)
#[1] 15.03191

ll - penalty
#[1] -584.4399

If there was an edge, you calclate the log-likelihood using the fitted values and residual errors. For the net:

Network = set.arc(Network, "A", "B")

We need the log-likelihood components from node A, and C

(llA = with(X, sum(dnorm(A, mean(A), sd(A), log=TRUE))))
#[1] -184.0414
(llC = with(X, sum(dnorm(C, mean(C), sd(C), log=TRUE))))
#[1] -297.5887

and we get B's conditional probabilities from a linear regression

m = lm(B ~ A, X)
(llB = with(X, sum(dnorm(B, fitted(m), stats::sigma(m), log=TRUE))))
#[1] -86.73894

Giving

(ll = llA + llB + llC)
#[1] -568.3691
(penalty = 0.5* log(nrow(X))* 7)
#[1] 17.53722
ll - penalty
#[1] -585.9063 

#  bnlearn::score(Network, X, type="bic-g", debug=TRUE)
# ----------------------------------------------------------------
# * processing node A.
#    loglikelihood is -184.041441.
#    penalty is 2.505318 x 2 = 5.010635.
# ----------------------------------------------------------------
# * processing node B.
#    loglikelihood is -86.738936.
#    penalty is 2.505318 x 3 = 7.515953.
# ----------------------------------------------------------------
# * processing node C.
#    loglikelihood is -297.588727.
#    penalty is 2.505318 x 2 = 5.010635.
# [1] -585.9063

R Bnlearn Eval Inside Function