6.1 When to Use Estimate
The Estimate command calculates a maximum likelihood
estimate of the parameters of a discrete
Bayesian Network model without latent
variables. It cannot be used to estimate parameters in a continuous linear
model.
6.2 The Input and Output of Estimate
The input to Estimate is:
1. Cell count or raw data for discrete variables, and
2. A directed acyclic graph.
The output from Estimate consists of a discrete Bayesian network, which can be read as input to TETRAD II for use in either the Monte Carlo simulation module (see chapter 13), or the Update module (chapter 7).
6.3 A Simple Example
The following session illustrates how to use Estimate. We use two input files, one for the raw data and one for the graph. The raw data is in the input file est.dat. The graph is in the input file est.g. We show part of the data file in Fig. 6.1, and the graph file in Fig. 6.2.
################ est.dat ###################
{The Generating Model
/Raw
2000
x1 x2 x3
x4 x5 x6
0 1 0
1 0 0
0 1
0 0 0 0
0 0
0 0 1 0
1 0
0 1 0 0
.
.
################ est.dat ###################
Fig. 6.1: est.dat
################
est.g ###################
/graph
x1 x2
x1 x3
x2 x4
x3 x4
x4 x5
x5 x6
################ est.g
###################
Fig. 6.2: est.g
Session 6.1 shows how to use the Estimate command with these input files.
Session 6.1: Using the
estimate command
***************************************************
Initializing Data
Structures
>input
Input File: est.dat
>input
Input File: [est.dat]est.g
>estimate
Output file: est.out
>exit
***************************************************
In Fig. 6.3 we show a portion of est.out, and compare it to a portion of the network that actually generated the data taken from the file est.in. Note that the order of the variables in the two files is different.
########### est.out
###########
The Probability
Distribution
----------------------------
x1 Parents:
p(x1=0)= 0.3995
p(x1=1)= 0.6005
----------------------------
x2 Parents: x1
when x1=0
p(x2=1)= 0.7221
p(x2=0)= 0.2778
when x1=1
p(x2=1)= 0.5937
p(x2=0)= 0.4063
########### est.out ###########
########### est.in
###########
The Probability
Distribution
----------------------------
x1 Parents:
p(x1=0)= 0.404 p(x1=1)=
0.596
----------------------------
x2 Parents: x1
when x1=0
p(x2=0)= 0.272 p(x2=1)=
0.728
when x1=1
p(x2=0)= 0.412 p(x2=1)=
0.587
########### est.in ###########
Fig. 6.3
6.4 Estimating a Model Suggested by Build
In section 5.9 in the chapter on the Build module, we applied Build to data that (1968) collected on college plans and parental encouragement. The result of that analysis was a model that we stored in the file shaw.g. Here we show how to estimate the parameters of this model. In the next chapter we show how to use the result to calculate conditional probabilities.
Sewell and ShahSewall and Shaw studied five variables from a sample of 10,318 Wisconsin high school seniors. The variables and their values were:
sex [male = 0, female = 1]
iq = Intelligence Quotient, [lowest = 0, highest = 2]
cp = college plans [yes = 0, no = 1]
pe = parental encouragement [0 = low, 1 = high]
ses = socioeconomic status [0 = lowest, ... 3 = highest]
Build output the following pattern.
Fig. 6.4: Build's Pattern
TETRAD II cannot estimate parameter values for a pattern, but only for one member of the set of indistinguishable models the pattern represents. To choose one member of the class of models represented by the pattern in Fig. 6.4 we reasoned as follows. The pattern does not orient the edge between iq and ses. That causal connection might be an influence of ses on iq, or of iq on ses, or to an unmeasured common cause of both. It seems very unlikely that the child's intelligence causes the family socioeconomic status, and the only sensible interpretation is that ses causes iq, or that they have a common unmeasured cause. Because the program will not estimate discrete models with latent variables, we assume the former. We show this model graphically in Fig. 6.5 and the corresponding input file (shaw.g) in Fig. 6.6.
Fig. 6.5: The Model in shaw.g
############# shaw.g #############
/graph
sex pe
iq cp
iq pe
ses iq
pe cp
ses cp
ses pe
############# shaw.g #############
Fig. 6.6: shaw.g
We can now ask the program to estimate the probability distribution. and we do so in session 6.2.
Session 6.2: Estimating a model of college plans
**************************************************
>in
Input File: shaw.dat
>in
Input File: [shaw.dat]shaw.g
Erasing previous Temporal
Order
>estimate
Output file: shaw.bn
>exit
**************************************************
The resulting output (shaw.bn) contains an explicitly written probability for every value of each variable conditional on its immediate parents, starting with the variables ses and sex that have no parents in the input graph. We show some of this file here:
############### shaw.bn #################
The Probability
Distribution
----------------------------
sex Parents:
p(sex=0)= 0.4837
p(sex=1)= 0.5163
----------------------------
ses Parents:
p(ses=0)= 0.2403
p(ses=1)= 0.2565 p(ses=2)=
0.2562
p(ses=3)= 0.2468
----------------------------
iq Parents: ses
when ses=0
p(iq=0)= 0.3798 p(iq=1)=
0.2818 p(iq=2)= 0.2052
p(iq=3)= 0.1331
when ses=1
p(iq=0)= 0.2762 p(iq=1)=
0.2860 p(iq=2)= 0.2399
p(iq=3)= 0.1979
when ses=2
p(iq=0)= 0.2231 p(iq=1)=
0.2576 p(iq=2)= 0.2776
p(iq=3)= 0.2417
when ses=3
p(iq=0)= 0.1170 p(iq=1)=
0.2065 p(iq=2)= 0.2787
p(iq=3)= 0.3977
############### shaw.bn #################
6.5 How Estimate Works
Each Bayesian network with graph G represents a set of probability distributions, each of which can be factored according to the following rule, where V is the set of all variables in the distribution, and for each x in V, px is the set of parents of x in G.
The maximum likelihood estimate of each parameter P(x|px) is the frequency of x and px divided by the frequency of px in the sample. See Kiiveri and Speed (1982).