Estimate

 

6.1 When to Use Estimate

 

The Estimate command calculates a maximum likelihood estimate of the parameters of a discrete Bayesian Network model without latent variables. It cannot be used to estimate parameters in a continuous linear model.

 

6.2 The Input and Output of Estimate

 

The input to Estimate is:

 

1. Cell count or raw data for discrete variables, and

2. A directed acyclic graph.

 

The output from Estimate consists of a discrete Bayesian network, which can be read as input to TETRAD II for use in either the Monte Carlo simulation module (see chapter 13), or the Update module (chapter 7).

 

6.3 A Simple Example

 

The following session illustrates how to use Estimate. We use two input files, one for the raw data and one for the graph. The raw data is in the input file est.dat. The graph is in the input file est.g. We show part of the data file in Fig. 6.1, and the graph file in Fig. 6.2.

.dat; 

################   est.dat   ###################

{The Generating Model

/Raw

2000

x1    x2    x3    x4    x5    x6   

0     1     0     1     0     0     

0     1     0     0     0     0    

0     0     0     0     1     0    

1     0     0     1     0     0    

.

.

################   est.dat   ###################

Fig. 6.1: est.dat

 

 

################   est.g   ###################

/graph

x1 x2

x1 x3

x2 x4

x3 x4

x4 x5

x5 x6

 

################   est.g   ###################

Fig. 6.2: est.g

 

Session 6.1 shows how to use the Estimate command with these input files.

 

Session 6.1: Using the estimate command

 

***************************************************

Initializing Data Structures

>input

Input File: est.dat

>input

Input File: [est.dat]est.g

 

>estimate

Output file: est.out

 

>exit

***************************************************

 

In Fig. 6.3 we show a portion of est.out, and compare it to a portion of the network that actually generated the data taken from the file est.in. Note that the order of the variables in the two files is different.

 

###########   est.out ###########.out;

The Probability Distribution

----------------------------

x1  Parents:

  p(x1=0)= 0.3995     p(x1=1)= 0.6005  

----------------------------

x2  Parents: x1

 when x1=0

  p(x2=1)= 0.7221     p(x2=0)= 0.2778  

 when x1=1

  p(x2=1)= 0.5937     p(x2=0)= 0.4063  

###########   est.out ###########

 

###########   est.in ###########

The Probability Distribution

----------------------------

x1  Parents:

  p(x1=0)= 0.404  p(x1=1)= 0.596

----------------------------

x2  Parents: x1

 when x1=0

  p(x2=0)= 0.272  p(x2=1)= 0.728

 when x1=1

  p(x2=0)= 0.412  p(x2=1)= 0.587

###########   est.in ###########

Fig. 6.3

 

6.4 Estimating a Model Suggested by Build

In section 5.9 in the chapter on the Build module, we applied Build to data that .;Sewell and Shah.; (1968) collected on college plans and parental encouragement. The result of that analysis was a model that we stored in the file shaw.g. Here we show how to estimate the parameters of this model. In the next chapter we show how to use the result to calculate conditional probabilities.

Sewall and Shaw studied five variables from a sample of 10,318 Wisconsin high school seniors. The variables and their values were:

 

sex                                                 [male = 0, female = 1]

iq = Intelligence Quotient,             [lowest = 0, highest = 2]

cp = college plans                         [yes = 0, no = 1]

pe = parental encouragement         [0 = low, 1 = high]

ses = socioeconomic status           [0 = lowest, ... 3 = highest]

 

Build output the following pattern.

 

Fig. 6.4: Build's Pattern

 

TETRAD II cannot estimate parameter values for a pattern, but only for one member of the set of indistinguishable models the pattern represents. To choose one member of the class of models represented by the pattern in Fig. 6.4 we reasoned as follows. The pattern does not orient the edge between iq and ses. That causal connection might be an influence of ses on iq, or of iq on ses, or to an unmeasured common cause of both. It seems very unlikely that the child's intelligence causes the family socioeconomic status, and the only sensible interpretation is that ses causes iq, or that they have a common unmeasured cause. Because the program will not estimate discrete models with latent variables, we assume the former. We show this model graphically in Fig. 6.5 and the corresponding input file (shaw.g) in Fig. 6.6.

 

Fig. 6.5: The Model in shaw.g

.g;

#############   shaw.g   #############

/graph

sex  pe

iq  cp

iq  pe

ses iq

pe  cp

ses cp

ses pe

 

#############   shaw.g   #############

 

Fig. 6.6: shaw.g

 

We can now ask the program to estimate the probability distribution. and we do so in session 6.2.

 

Session 6.2: Estimating a model of college plans

 

**************************************************

>in

Input File: shaw.dat

>in

Input File: [shaw.dat]shaw.g

 

Erasing previous Temporal Order

>estimate

Output file: shaw.bn

>exit

**************************************************

 

The resulting output (shaw.bn) contains an explicitly written probability for every value of each variable conditional on its immediate parents, starting with the variables ses and sex that have no parents in the input graph. We show some of this file here:

 

###############   shaw.bn   #################.bn;

The Probability Distribution

----------------------------

sex  Parents:

  p(sex=0)= 0.4837   p(sex=1)= 0.5163

----------------------------

ses  Parents:

  p(ses=0)= 0.2403    p(ses=1)= 0.2565   p(ses=2)= 0.2562 

  p(ses=3)= 0.2468

----------------------------

iq  Parents: ses

 when ses=0

  p(iq=0)= 0.3798  p(iq=1)= 0.2818  p(iq=2)= 0.2052

  p(iq=3)= 0.1331

 when ses=1

  p(iq=0)= 0.2762   p(iq=1)= 0.2860   p(iq=2)= 0.2399

  p(iq=3)= 0.1979

 when ses=2

  p(iq=0)= 0.2231   p(iq=1)= 0.2576   p(iq=2)= 0.2776

  p(iq=3)= 0.2417

 when ses=3

  p(iq=0)= 0.1170   p(iq=1)= 0.2065   p(iq=2)= 0.2787

  p(iq=3)= 0.3977

###############   shaw.bn   #################

 

6.5 How Estimate Works

Each Bayesian network with graph G represents a set of probability distributions, each of which can be factored according to the following rule, where V is the set of all variables in the distribution, and for each x in V, px is the set of parents of x in G.

 

 

The maximum likelihood estimate of each parameter P(x|px) is the frequency of x and px divided by the frequency of px in the sample. See .;Kiiveri and .;Speed (1982).