R bnlearn eval inside function
Here is a solution to the problem:
library(bnlearn)
data(learning.test)
fitted = bn.fit(hc(learning.test), learning.test)
myfuncBN=function() {
vars = names(learning.test)
obs = 2
str1 = paste("(", vars[-3], "=='",
sapply(learning.test[obs,-3], as.character), "')",
sep = "", collapse = " & ")
str2 = paste("(", vars[3], "=='",
as.character(learning.test[obs, 3]), "')", sep = "")
eval(parse(text=paste("cpquery(fitted,",str2,",",str1,")")))
}
set.seed(1)
myfuncBN()
# [1] 0.05940594
This value is equal to the result given by:
set.seed(1)
cpquery(fitted, event=(C=="c"),
evidence=((A=="b") & (B=="a") & (D=="a") & (E=="b") & (F=="b")))
# [1] 0.05940594
Problem with Bayesian Network with bnlearn cpquery in shiny server - supplying evidence
The solution, many thanks to user20650, is to use renderText around the whole calculation. Works beautifully.
library(shiny)
library(bnlearn)
tdag = bn.fit(hc(learning.test[5:6]), learning.test[5:6])
shinyApp(
ui = basicPage(
selectInput("e", "E:", choices=letters[1:3] ),
selectInput("f", "F:", choices=letters[1:2] ),
textOutput("prob")
),
server = function(input, output, session) {
output$prob <- renderText({
event <- paste0("(F == '", input$f, "')")
evidence <- paste0("(E == '", input$e, "')")
eval(parse(text=paste(
'cpquery(fitted=tdag,
event = ', event, ',
evidence = ', evidence, ',
n = 100000,
debug = TRUE)'
)))})}
)
Using cpquery function for several pairs from dataset
cpquery
is quite difficult to work with programmatically. If you look at the examples in the help page you can see the author uses eval(parse(...))
to build the queries. I have added two approaches below, one using the methods from the help page and one using cpdist
to draw samples and reweighting to get the probabilities.
Your example
library(bnlearn); library(dplyr)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
This uses cpquery
and the much maligned eval(parse(...))
-- this is the
approach the the bnlearn
author takes to do this programmatically in the ?cpquery
examples. Anyway,
# You want the evidence and event to be the same; in your question it is `1`
# but for example using learning.test data we use 'a'
state = "\'a\'" # note if the states are character then these need to be quoted
event = paste(sttbl$from, "==", state)
evidence = paste(sttbl$to, "==", state)
# loop through using code similar to that found in `cpquery`
set.seed(1) # to make sampling reproducible
for(i in 1:nrow(sttbl)) {
qtxt = paste("cpquery(fit, ", event[i], ", ", evidence[i], ",n=1e6", ")")
sttbl$prob[i] = eval(parse(text=qtxt))
}
I find it preferable to work with cpdist
which is used to generate random samples conditional on some evidence. You can then use these samples to build up queries. If you use likelihood weighting (method="lw"
) it is slightly easier to do this programatically (and without evil(parse(...))
).
The evidence is added in a named list i.e. list(A='a')
.
# The following just gives a quick way to assign the same
# evidence state to all the evidence nodes.
evidence = setNames(replicate(nrow(sttbl), "a", simplify = FALSE), sttbl$to)
# Now loop though the queries
# As we are using likelihood weighting we need to reweight to get the probabilities
# (cpquery does this under the hood)
# Also note with this method that you could simulate from more than
# one variable (event) at a time if the evidence was the same.
for(i in 1:nrow(sttbl)) {
temp = cpdist(fit, sttbl$from[i], evidence[i], method="lw")
w = attr(temp, "weights")
sttbl$prob2[i] = sum(w[temp=='a'])/ sum(w)
}
sttbl
# from to strength prob prob2
# 1 A D -1938.9499 0.6186238 0.6233387
# 2 A B -1153.8796 0.6050552 0.6133448
# 3 C D -823.7605 0.7027782 0.7067417
# 4 B E -720.8266 0.7332107 0.7328657
# 5 F E -549.2300 0.5850828 0.5895373
Understanding the event and evidence parameters in bnlearns cpquery
I think that this section is just trying to show what the evidence and event expressions represent in the original data and how they relate to logic sampling. e.g. if you subset the original data with the event/evidence, then it will subset the data to only contains rows that agree with the event/evidence.
But you are correct that the notation data[evidence, ]
won't work. You can uselearning.test[eval(expression(B == "b"), learning.test), ]
, which may help you understand the intent. This is similar to what used to appear in older help files.
For logic sampling (the default inference method) a bunch of samples are generated from the BN, the default is N=10000 samples (*). So 1) you have estimated a parameterised BN, 2) draw N samples from it, 3) subset this sample according to event / evidence vectors to calculate some conditional probability (hopefully correctness is not lost in my brevity).
cpquery(fitted, (learning.test$B == "b"), (learning.test$A == "a"))
doesn't work as the default number of samples in the sampled BN (step 2. above) is 10000 but learning.test
has 5000 rows (and so the logical vector produced by (learning.test$B == "b")
has length 5000). You can force your approach to work, (by work I mean produce an answer), by setting the n
parameter , cpquery(fitted, (learning.test$B == "b"), (learning.test$A == "a"), n=5000)
but this will not always do what you expect: you are evaluating the event/evidence expression against the observed data instead of the simulated data. I think that if this sort of logical vector is supplied then cpquery
should probably throw an error. Best to stick with the documented approaches.
(*) From the help I'd expect the default number of samples to be 5000 * log10(nparams(fitted))) =~ 8000, but perhaps batch
= 10000 takes precedence.
parallelization of bnlearn (with parallel package)
This is a bug. Please report it to the package maintainer.
Here is the code of check.cluster
:
function (cluster)
{
if (is.null(cluster))
return(TRUE)
if (any(class(cluster) %!in% supported.clusters))
stop("cluster is not a valid cluster object.")
if (!requireNamespace("parallel"))
stop("this function requires the parallel package.")
if (!isClusterRunning(cluster))
stop("the cluster is stopped.")
}
Now, if you look at the class of cl
:
class(cl)
#[1] "SOCKcluster" "cluster"
Let's reproduce the check:
bnlearn:::supported.clusters
#[1] "MPIcluster" "PVMcluster" "SOCKcluster"
`%!in%` <- function (x, table) {
match(x, table, nomatch = 0L) == 0L
}
any(class(cl) %!in% bnlearn:::supported.clusters)
#[1] TRUE
cluster
is not in supported.clusters
. I believe, the function should only check if the cluster has a supported class and not if it has an unsupported class.
As a work-around you could change supported.clusters
:
assignInNamespace("supported.clusters",
c("cluster", "MPIcluster",
"PVMcluster", "SOCKcluster"),
"bnlearn")
How bnlearn calculates BIC of continuous data?
The BIC is calculated as
BIC = -2* logLik + nparams* log(nobs)
but in bnlearn
this is rescaled by -2 (see?score
) to give
BIC = logLik -0.5* nparams* log(nobs)
So for your example, with no edges the likelihood is calculated using the marginal means, and errors (or more generally, for each node the number of parameters is given by summing 1 (intercept) + 1 (residual error) + the number of parents), e.g.
library(bnlearn)
X = iris[, 1:3]
names(X) = c("A", "B", "C")
Network = empty.graph(names(X))
(ll = sum(sapply(X, function(i) dnorm(i, mean(i), sd(i), log=TRUE))))
#[1] -569.408
(penalty = 0.5* log(nrow(X))* 6)
#[1] 15.03191
ll - penalty
#[1] -584.4399
If there was an edge, you calclate the log-likelihood using the fitted values and residual errors. For the net:
Network = set.arc(Network, "A", "B")
We need the log-likelihood components from node A, and C
(llA = with(X, sum(dnorm(A, mean(A), sd(A), log=TRUE))))
#[1] -184.0414
(llC = with(X, sum(dnorm(C, mean(C), sd(C), log=TRUE))))
#[1] -297.5887
and we get B's conditional probabilities from a linear regression
m = lm(B ~ A, X)
(llB = with(X, sum(dnorm(B, fitted(m), stats::sigma(m), log=TRUE))))
#[1] -86.73894
Giving
(ll = llA + llB + llC)
#[1] -568.3691
(penalty = 0.5* log(nrow(X))* 7)
#[1] 17.53722
ll - penalty
#[1] -585.9063
# bnlearn::score(Network, X, type="bic-g", debug=TRUE)
# ----------------------------------------------------------------
# * processing node A.
# loglikelihood is -184.041441.
# penalty is 2.505318 x 2 = 5.010635.
# ----------------------------------------------------------------
# * processing node B.
# loglikelihood is -86.738936.
# penalty is 2.505318 x 3 = 7.515953.
# ----------------------------------------------------------------
# * processing node C.
# loglikelihood is -297.588727.
# penalty is 2.505318 x 2 = 5.010635.
# [1] -585.9063
Related Topics
Solve Homogenous System Ax = 0 for Any M * N Matrix a in R (Find Null Space Basis for A)
Puzzled by Xlim/Ylim Behavior in R
Convert an Integer Column to Time Hh:Mm
Counting the Number of Values Greater Than 0 in R in Multiple Columns
Rolling by Group in Data.Table R
In Place Modification of Matrices in R
How to Get Mean of Every N Rows and Keep the Date Index
Vector of Cumulative Sums in R
Get Most Frequent String from a Data Frame Column
Dplyr Write a Function with Column Names as Inputs
R Programming: Read.Csv() Skips Lines Unexpectedly
Use 'J' to Select the Join Column of 'X' and All Its Non-Join Columns
Placement of Error Bars in Barplot Using Ggplot2
Convert Numeric Vector to Binary (0/1) Based on Limit
How to Pass Multiple Group_By Arguments and a Dynamic Variable Argument to a Dplyr Function
Adding a New Column to Matrix Error
Convert Month's Number to Month Name
Subsetting a Data Frame to the Rows Not Appearing in Another Data Frame