bayesian networks with the catnet package: handling missing data

Question

I am new to this community, r, and programming in general. (Thanks in advance for your patience!) I am working on a project that involves bayesian-networks.

Strait to the question. The following code was posted on this site in response to a question titled "NA/NaN values in bnlearn package R"

rm(list=ls())

### generate random data (not simply independent binomials)
set.seed(123)
n.obs <- 10
a1 <- rbinom(n.obs,1,.3)
a2 <- runif(n.obs)
a3 <- floor(-3*log(.25+3*a2/4))
a3[a3>=2] <- NA
a2 <- floor(2*a2)
my.data <- data.frame(a1,a2,a3 )
### discretize data into proper categories
my.data <- cnDiscretize(my.data,numCategories=2)

my.data
##    a1 a2 a3
## 1   1  2  1
## 2   2  1  2
## 3   1  2  1
## 4   2  2  2
## 5   2  1 NA
## 6   1  2  1
## 7   1  1 NA
## 8   2  1 NA
## 9   1  1 NA
## 10  1  2  1

## say we want a2 conditional on a1,a3

## first generate a network with a1,a3 ->a2
cnet <- cnNew(
      nodes = c("a1", "a2", "a3"),
      cats = list(c("1","2"), c("1","2"), c("1","2")),
      parents = list(NULL, c(1,3), NULL)
      )


## set the empirical probabilities from data=my.data
cnet2 <- cnSetProb(cnet,data=my.data)

## to get the conditional probability table
cnProb(cnet2,which='a2')

##$a2
##         a1        a3         0         1
## A 0.0000000 0.0000000 0.0000000 1.0000000
## B 0.0000000 1.0000000 0.5712826 0.4287174
## A 1.0000000 0.0000000 0.0000000 1.0000000
## B 1.0000000 1.0000000 0.5685786 0.4314214

However when I copy, paste and run the code I get a different result (see below).

rm(list=ls())

### generate random data (not simply independent binomials)
set.seed(123)
n.obs <- 10
a1 <- rbinom(n.obs,1,.3)
a2 <- runif(n.obs)
a3 <- floor(-3*log(.25+3*a2/4))
a3[a3>=2] <- NA
a2 <- floor(2*a2)
my.data <- data.frame(a1,a2,a3 )
### discretize data into proper categories
my.data <- cnDiscretize(my.data,numCategories=2)

my.data
##   a1 a2 a3
## 1   1  2  1
## 2   2  1  2
## 3   1  2  1
## 4   2  2  2
## 5   2  1 NA
## 6   1  2  1
## 7   1  1 NA
## 8   2  1 NA
## 9   1  1 NA
## 10  1  2  1

## say we want a2 conditional on a1,a3 
## first generate a network with a1,a3 ->a2
cnet <- cnNew(
    nodes = c("a1", "a2", "a3"),
    cats = list(c("1","2"), c("1","2"), c("1","2")),
    parents = list(NULL, c(1,3), NULL)
    )


## set the empirical probabilities from data=my.data
cnet2 <- cnSetProb(cnet,data=my.data)

## to get the conditional probability table
cnProb(cnet2,which='a2')
## $a2
##   a1  a3   1   2
## A 1.0 1.0 0.0 1.0
## B 1.0 2.0 0.5 0.5
## A 2.0 1.0 0.5 0.5
## B 2.0 2.0 0.5 0.5

Could someone explain why my results are different? I ask because I am trying to understand how catnet handles missing data.

Best,

John

Bob Hopez · Accepted Answer · 2016-06-30 16:39:10Z

0

The top/bottom code are identical--they should output the same results. I checked through the catnet functions for other packages which use the same function -- could be your issue. It's good practice to use the :: notation when using non-base functions.

rm(list=ls())
library(catnet)

### generate random data (not simply independent binomials)
set.seed(123)
n.obs <- 10
a1 <- rbinom(n.obs,1,.3)
a2 <- runif(n.obs)
a3 <- floor(-3*log(.25+3*a2/4))
a3[a3>=2] <- NA
a2 <- floor(2*a2)
my.data <- data.frame(a1,a2,a3 )
### discretize data into proper categories
my.data <- catnet::cnDiscretize(my.data,numCategories=2)

my.data
##    a1 a2 a3
## 1   1  2  1
## 2   2  1  2
## 3   1  2  1
## 4   2  2  2
## 5   2  1 NA
## 6   1  2  1
## 7   1  1 NA
## 8   2  1 NA
## 9   1  1 NA
## 10  1  2  1

## say we want a2 conditional on a1,a3

## first generate a network with a1,a3 ->a2
cnet <- catnet::cnNew(
  nodes = c("a1", "a2", "a3"),
  cats = list(c("1","2"), c("1","2"), c("1","2")),
  parents = list(NULL, c(1,3), NULL)
)


## set the empirical probabilities from data=my.data
cnet2 <- catnet::cnSetProb(cnet,data=my.data)

## to get the conditional probability table
catnet::cnProb(cnet2,which='a2')

# $a2
# a1  a3   1   2
# A 1.0 1.0 0.0 1.0
# B 1.0 2.0 0.5 0.5
# A 2.0 1.0 0.5 0.5
# B 2.0 2.0 0.5 0.5

answered Jun 30, 2016 at 16:39

Bob Hopez

7834 gold badges10 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Bob Hopez Over a year ago

If you place catnet:: in front of the relevant function, it should work for you in both runs. Try it with both, and see if you're using a shared function.

John Over a year ago

Thanks for your help, Bob! Is that the output you got when you ran it? The table at the end is what I'm interested in. The table in your post is the same as the one I got, but different than that in the original post. It appears from my output (and the output in your post) that the rows containing missing data (my.data) are simply ignored when calculating the conditional probabilities (given in the table at the end). This appears not to be the case in the original post. Anyway, thanks again for helping me get my head around this stuff.

John Over a year ago

The notation works, thanks again Bob! However it still gives me different output than the original poster. Maybe he/she was using a shared function... Is there a way to ask this "original poster" (@pes) I keep mentioning? I would have commented on his/her post if I was able to.

Bob Hopez Over a year ago

Running this should help you manually check: ftable(my.data, row.vars = 1:3, exclude = F)

Collectives™ on Stack Overflow

bayesian networks with the catnet package: handling missing data

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related