Update

7.1 Introduction

Bayesian networks are an increasingly popular representation scheme for computerized expert systems. They form a useful alternative to log-linear, logistic regression, and other statistical formalisms for discrete variables.

We distinguish two kinds of predictions that can be made using Bayesian networks. First, in a given causal structure represented by a directed graph, one may try to predict the value of some variables, given the values of other variables. This is typically what is done in diagnostic problems. Given symptoms, one tries to determine what disease is present. Alternatively, one may try to predict the value of some variables, given the values of other variables, after the causal structure has been interfered with in some prescribed way. This is typically what is attempted when various policies are compared. An example of this kind of problem is "What will the rate of inflation be if the Federal Reserve were to raise the prime rate by 0.25%?" We will explain how to answer both types of these questions using TETRAD II with Bayesian networks.

7.2 Using the Update Module

Given a parameterized Bayesian network and a list of relevant facts describing an individual (that is, a subset, possibly empty, of the variables in the network and their associated values for that individual), the Update module will produce an estimate of the probabilities of each of the remaining variables conditional on the set of facts provided. In addition the Update module will give a list of the most probable values of all other variables in light of the facts given to it. Consider the following example, originally from .;Cooper (1984):

mc = metastatic Cancer bt = brain tumor

sc = serum calcium c = coma h = severe headache

Fig. 7.1

In this example, metastatic cancer is a cause of brain tumors and also a cause of increased total serum calcium level. Both of these conditions cause patients to fall into a coma, and severe headaches are caused by brain tumors. We will assume that each of the variables is boolean with value equal to one if the condition occurs, and with value zero if the condition does not occur (e.g., h=1 means that a patient has severe headaches). From such a model and an appropriate parameterization one could answer the following questions with the update module: What is the probability that a patient has metastatic cancer given that he or she has high serum calcium and severe headaches? What is the best explanation (list of values for all of the variables) for a patient having severe headaches?

You need a parameterized Bayesian network to use the Update module. For example the following file (cooper.bn) can be created by modifying a file created by the MakeModel command (see chap. 9):

.bn;

############### cooper.bn ##############

/BAYESNETWORK

Number of Values of

Variable Categories Categories

mc 2 0 1

sc 2 0 1

bt 2 0 1

c 2 0 1

h 2 0 1

The Probability Distribution

----------------------------

mc Parents:

p(mc=0)= 0.800 p(mc=1)= 0.200

----------------------------

sc Parents: mc

when mc=0

p(sc=0)= 0.800 p(sc=1)= 0.200

when mc=1

p(sc=0)= 0.200 p(sc=1)= 0.800

----------------------------

bt Parents: mc

when mc=0

p(bt=0)= 0.950 p(bt=1)= 0.050

when mc=1

p(bt=0)= 0.800 p(bt=1)= 0.200

----------------------------

c Parents: sc bt

when sc=0 bt=0

p(c=0)= 0.950 p(c=1)= 0.050

when sc=0 bt=1

p(c=0)= 0.200 p(c=1)= 0.800

when sc=1 bt=0

p(c=0)= 0.200 p(c=1)= 0.800

when sc=1 bt=1

p(c=0)= 0.200 p(c=1)= 0.800

----------------------------

h Parents: bt

when bt=0

p(h=0)= 0.400 p(h=1)= 0.600

when bt=1

p(h=0)= 0.200 p(h=1)= 0.800

############### cooper.bn ##############

Fig. 7.2: cooper.bn

Session 7.1 illustrates how to use this network by providing facts about a new case.

.bn;

Session 7.1.

*****************************************

>in

Input File: cooper.bn

We begin by telling TETRAD what evidence we want to condition on.

>addevidence

R-do we want thisExample of input:

mc 0

Evidence: h 1

Evidence: sc 1

Evidence:

To start Update, simply type update at the prompt.

>update

Output file: cooper.out

The current list of evidence is:

sc 1

h 1

Updating network

To remove one piece of evidence type removeevidence at the prompt and specify a variable when prompted.

>removeevidence

Example of input:

Variable: h

Variable: <CR>

To remove all of the evidence type removeallevidence at the prompt.

>removeallevidence

>exit

*****************************************

We show the relevant portions of the file cooper.out in Fig. 7.3:

########## cooper.out ##############.out;

The current list of evidence is:

sc 1

h 1

The following assignments are the best explanations

mc sc bt c h prob. for instantiation

0 1 0 1 1 0.0729

1 1 0 1 1 0.0614

1 1 1 1 1 0.0205

0 1 0 0 1 0.0182

1 1 0 0 1 0.0153

0 1 1 1 1 0.0051

1 1 1 0 1 0.0051

0 1 1 0 1 0.0013

The probability of the conditioning set

P(sc=1, h=1) = 0.2000

Conditional probabilities (marginalizing statespace)

P(mc=0|sc=1, h=1) = 0.4880 P(mc=1|sc=1, h=1) = 0.5120

P(sc=0|sc=1, h=1) = 0.0000 P(sc=1|sc=1, h=1) = 1.0000

P(bt=0|sc=1, h=1) = 0.8400 P(bt=1|sc=1, h=1) = 0.1600

P(c=0|sc=1, h=1) = 0.2000 P(c=1|sc=1, h=1) = 0.8000

P(h=0|sc=1, h=1) = 0.0000 P(h=1|sc=1, h=1) = 1.0000

########## cooper.out ##############

Fig. 7.3: Update's Output (cooper.out)

The numbers in the "best explanations" table describe the most probable states in the joint distribution. In this case all variables are binary and take values 1 and 0 only. Each row gives the probability of an assignment of values to the variables compatible with the facts about the individual given in the evidence. The numbers following the "Conditional probabilities" heading give the probability of each value of each variable conditional on the evidence given to the program about the individual case.

Evidence can also be input to the program from a file, as the following example illustrates.

/knowledge

addevidence

sc 1

h 1

Fig. 7.4

The Addevidence command is followed by a list of lines describing the evidence. The last line must be followed by a blank line to signal that the addevidence command has ended. In this example, the last line "h 1" is followed by a blank line.

7.3 Bayesian Networks and Expert Systems

A Bayesian network can be used as an expert system in two different contexts. They are

1. Based on values an individual or unit happens to have, you want to predict values of other variables for that individual.

2. You want to predict the values an individual or unit will have for some variables if you force the individual or unit to have certain values for other variables.

The first situation applies, for example, if one wants to select from a population individuals who are likely to complete a training course, and a Bayesian network including relevant variables and job success is available for a sample of the population. The second situation applies if, for example, a Bayesian network has been constructed from data in which people have been assigned to a training regimen according to a pre-test score, and one wants to predict how an arbitrary new individual will do if given a particular training regimen regardless of pretest score.

The first use is quite easy if a correct Bayesian network with all parameters is available; simply enter the facts about the individual and run the update command. The second case is more complicated and less automated, and requires constructing a new Bayesian network from the original one.^[1] Here is what to do if you want to predict the result of forcing a new unit to have, for example, a = 1 by some means that does not otherwise perturb the causal dependencies described by the network:

1. In the graph of the original Bayesian network, remove all edges directed into a.

2. Use Makemod to create a new Bayesian network from the new graph. Parameterize the modified network as before, except make a exogeonous and no longer conditioned on its former parents. Set P(a = 1) to 1, and assign probability 0 to other values of a.

3. Run Update on the empty set of evidence to get the new probabilities of other variables.

The new Bayesian network can now be used in Update with evidence (about variables other than a) to predict features of an individual who will be forced to have value a = a.

This method of predicting the effects of manipulation, treatment or experimental intervention is reliable only if the treatment does not otherwise alter the causal structure, and only if the treatment ensures that a treated individual will have the value a for the variable a.^[2]

7.4 Update Methods

We can broadly categorize update algorithms as either exact or probabilistic. Exact update algorithms quickly grow intractable for general networks especially as the connectivity of the network grows. The general problem of updating has been shown to be NP-hard (Cooper, 1987). .;

7.4.1 Method for Calculating Conditional Probabilities

An exact method is used if the problem is suitably small.^[3] The exact method implemented uses brute force to calculate all of the joint probabilities in the state space and then performs the relevant summations to calculate the conditional probabilities.

If an exact method is not suitable then the probabilistic method of likelihood weighting is used .;.;(Schacter & Peot, 1990). Likelihood weighting is a robust version of logic sampling method (i.Henrion, M.;Henrion, 1988). We generate samples pseudo-randomly, but with the further constraint that the value assigned to a variable that is in the evidence set is its value in the evidence. From these samples we calculate the likelihood of that value appearing given the evidence and tally these likelihoods to calculate the estimated conditional probabilities.

7.4.2 Method for Calculating Most Probable Explanations

The most probable explanation (MPE) is defined to be that assignment of values to all of the variables such that the instantiation is consistent with the evidence set and such that the joint probability is maximal. If the exact method is used then the joint probability of each of the possible instantiations is calculated and the top 10 instantiations (based on the joint probability) are kept and reported to the user. If the probabilistic method is used then the joint probability is calculated for each of the instantiations that are generated and the top 10 are stored and reported to the user.

^[2]For a detailed discussion, see Spirtes, Glymour, and Scheines (1993). See also .;Robins (1986 & 1989)

^[3]If the total state space for the Bayesian network does not exceed a fixed limit, currently set at 200,000 states for the UNIX version of the program and 10,920 states for the PC version of the program, then the exact method will be used.