1

I am new to this community, r, and programming in general. (Thanks in advance for your patience!) I am working on a project that involves bayesian-networks.

Strait to the question. The following code was posted on this site in response to a question titled "NA/NaN values in bnlearn package R"

rm(list=ls())

### generate random data (not simply independent binomials)
set.seed(123)
n.obs <- 10
a1 <- rbinom(n.obs,1,.3)
a2 <- runif(n.obs)
a3 <- floor(-3*log(.25+3*a2/4))
a3[a3>=2] <- NA
a2 <- floor(2*a2)
my.data <- data.frame(a1,a2,a3 )
### discretize data into proper categories
my.data <- cnDiscretize(my.data,numCategories=2)

my.data
##    a1 a2 a3
## 1   1  2  1
## 2   2  1  2
## 3   1  2  1
## 4   2  2  2
## 5   2  1 NA
## 6   1  2  1
## 7   1  1 NA
## 8   2  1 NA
## 9   1  1 NA
## 10  1  2  1

## say we want a2 conditional on a1,a3

## first generate a network with a1,a3 ->a2
cnet <- cnNew(
      nodes = c("a1", "a2", "a3"),
      cats = list(c("1","2"), c("1","2"), c("1","2")),
      parents = list(NULL, c(1,3), NULL)
      )


## set the empirical probabilities from data=my.data
cnet2 <- cnSetProb(cnet,data=my.data)

## to get the conditional probability table
cnProb(cnet2,which='a2')

##$a2
##         a1        a3         0         1
## A 0.0000000 0.0000000 0.0000000 1.0000000
## B 0.0000000 1.0000000 0.5712826 0.4287174
## A 1.0000000 0.0000000 0.0000000 1.0000000
## B 1.0000000 1.0000000 0.5685786 0.4314214

However when I copy, paste and run the code I get a different result (see below).

rm(list=ls())

### generate random data (not simply independent binomials)
set.seed(123)
n.obs <- 10
a1 <- rbinom(n.obs,1,.3)
a2 <- runif(n.obs)
a3 <- floor(-3*log(.25+3*a2/4))
a3[a3>=2] <- NA
a2 <- floor(2*a2)
my.data <- data.frame(a1,a2,a3 )
### discretize data into proper categories
my.data <- cnDiscretize(my.data,numCategories=2)

my.data
##   a1 a2 a3
## 1   1  2  1
## 2   2  1  2
## 3   1  2  1
## 4   2  2  2
## 5   2  1 NA
## 6   1  2  1
## 7   1  1 NA
## 8   2  1 NA
## 9   1  1 NA
## 10  1  2  1

## say we want a2 conditional on a1,a3 
## first generate a network with a1,a3 ->a2
cnet <- cnNew(
    nodes = c("a1", "a2", "a3"),
    cats = list(c("1","2"), c("1","2"), c("1","2")),
    parents = list(NULL, c(1,3), NULL)
    )


## set the empirical probabilities from data=my.data
cnet2 <- cnSetProb(cnet,data=my.data)

## to get the conditional probability table
cnProb(cnet2,which='a2')
## $a2
##   a1  a3   1   2
## A 1.0 1.0 0.0 1.0
## B 1.0 2.0 0.5 0.5
## A 2.0 1.0 0.5 0.5
## B 2.0 2.0 0.5 0.5

Could someone explain why my results are different? I ask because I am trying to understand how catnet handles missing data.

Best,

John

1 Answer 1

0

The top/bottom code are identical--they should output the same results. I checked through the catnet functions for other packages which use the same function -- could be your issue. It's good practice to use the :: notation when using non-base functions.

rm(list=ls())
library(catnet)

### generate random data (not simply independent binomials)
set.seed(123)
n.obs <- 10
a1 <- rbinom(n.obs,1,.3)
a2 <- runif(n.obs)
a3 <- floor(-3*log(.25+3*a2/4))
a3[a3>=2] <- NA
a2 <- floor(2*a2)
my.data <- data.frame(a1,a2,a3 )
### discretize data into proper categories
my.data <- catnet::cnDiscretize(my.data,numCategories=2)

my.data
##    a1 a2 a3
## 1   1  2  1
## 2   2  1  2
## 3   1  2  1
## 4   2  2  2
## 5   2  1 NA
## 6   1  2  1
## 7   1  1 NA
## 8   2  1 NA
## 9   1  1 NA
## 10  1  2  1

## say we want a2 conditional on a1,a3

## first generate a network with a1,a3 ->a2
cnet <- catnet::cnNew(
  nodes = c("a1", "a2", "a3"),
  cats = list(c("1","2"), c("1","2"), c("1","2")),
  parents = list(NULL, c(1,3), NULL)
)


## set the empirical probabilities from data=my.data
cnet2 <- catnet::cnSetProb(cnet,data=my.data)

## to get the conditional probability table
catnet::cnProb(cnet2,which='a2')

# $a2
# a1  a3   1   2
# A 1.0 1.0 0.0 1.0
# B 1.0 2.0 0.5 0.5
# A 2.0 1.0 0.5 0.5
# B 2.0 2.0 0.5 0.5
Sign up to request clarification or add additional context in comments.

4 Comments

If you place catnet:: in front of the relevant function, it should work for you in both runs. Try it with both, and see if you're using a shared function.
Thanks for your help, Bob! Is that the output you got when you ran it? The table at the end is what I'm interested in. The table in your post is the same as the one I got, but different than that in the original post. It appears from my output (and the output in your post) that the rows containing missing data (my.data) are simply ignored when calculating the conditional probabilities (given in the table at the end). This appears not to be the case in the original post. Anyway, thanks again for helping me get my head around this stuff.
The notation works, thanks again Bob! However it still gives me different output than the original poster. Maybe he/she was using a shared function... Is there a way to ask this "original poster" (@pes) I keep mentioning? I would have commented on his/her post if I was able to.
Running this should help you manually check: ftable(my.data, row.vars = 1:3, exclude = F)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.