12.1 What Makemodel Does
The Makemodel module takes a directed acyclic graph (DAG) and creates a fully parameterized recursive linear structural equation model or a discrete Bayesian network stored in a TETRAD II readable file for use by the Monte Carlo generator (Fig. 12.1).
Fig. 12.1
For instance, Makemodel can take the input file makemod.g (Fig. 12.2) and produce an output file (makemod.lm, shown in Fig. 12.3) that contains a linear model, and with the same input file produce another output file (makemod.bn, shown in Fig. 12.4) that contains a Bayesian network, and both of these files can be read back into TETRAD II for use by the Monte Carlo generator.
############## makemod.g ################
/graph
x1 x2
x2 y
x3 y
############## makemod.g ################
Fig. 12.2: makemod.g
12.2 How to Use Makemodel
Makemodel branches according
to choices you make along the way. Session 12.1 shows how to make a linear
model.
Session 12.1: Making a linear model
******************************************
>input
Input
File: makemod.g
>makemodel
Output
file: makemod.lm
The
TETRAD II Model Builder
Create
linear structural equation model (l),
or
Bayesian network (b)? (l/b) [l] <CR>
Do
you want all error terms to have the same distribution? [y] <CR>
Error
terms ~N(0,1)? [y] <CR>
Makemodel will choose any linear coefficients that you have
not fixed in the input file randomly from a uniform distribution over the
interval you specify, and randomly make a proportion of them you specify
negative. Upper and lower bounds for the absolute value of the parameters apply
only to parameters chosen randomly by the Makemodel, and not to parameters
entered by the user in a /Graph section.
Lower
bound for absolute value of parameters:
[ 0.5000 ]: <CR>
Upper
bound for absolute value of parameters:
[ 1.5000 ]: <CR>
Approximate
percentage of pseudo-randomly selected
parameters
made negative: [ 0.0000 ]: <CR>
The
file containing this model is named: makemod.lm
You
can read the file into TETRAD with the INPUT command.
>exit
***********************************
The output file makemod.lm contains two sections, a /Linearmodel section and a /Graph section. The /Linearmodel section specifies the distributions over the independent error terms, and the /Graph section specifies the causal structure and linear coefficients.
################# makemod.lm #################
/graph
x1 x2
1.0376
x2 y
0.9973
x3 y
1.2325
/linearmodel
Variable Dist. Type Parameters
x1
Normal 0.0000 1.0000
x2
Normal 0.0000 1.0000
y
Normal 0.0000 1.0000
x3
Normal 0.0000 1.0000
################# makemod.lm #################
Fig. 12.3
In our representation of linear structural equation models,
every variable is a linear combination of its immediate causes and an
independent error term,[1]
so the only truly exogenous variables are the error terms. So the marginal
distributions over the error terms and the linear coefficients fully
parameterize the joint distribution over the variables considered.
Session 12.2 demonstrates how to take the same causal structure (makemod.g), and create a fully parameterized Bayesian network.
Session 12.2: Making a Bayesian network
******************************************
>input
Input
File: makemod.g
>makemodel
Output
file: makemod.bn
The
TETRAD II Model Builder
Create
linear structural equation model (l),
or
Bayesian network (b)? (l/b) [l] b
Let
TETRAD parameterize the Bayesian network randomly [y] ? <CR>
1)
x1
2)
x2
3)
y
4)
x3
All
variables default as binary valued (0,1).
Enter
the numbers of any you wish to change.
Numbers
= 2
Variable
is x2
Number
of Categories: [2]: 3
Non-integers
will be rounded.
Category
1 Value: [0]: 2
Category
1 Value: [1]: 0
Category
1 Value: [2]: 4
The
file containing this model is named: makemod.bn
You
can read the file into TETRAD with the INPUT command.
>exit
******************************************
The file makemod.bn
contains the randomly parameterized Bayesian network.
################# makemod.bn #################
/BAYESNETWORK
Number of Values of
Variable Categories
Categories
x1 2 0 1
x2 3 2 0 4
x3 2 0 1
y 2 0 1
The
Probability Distribution
----------------------------
x1 Parents:
p(x1=0)= 0.2729 p(x1=1)= 0.7271
----------------------------
x2 Parents: x1
when x1=0
p(x2=2)= 0.0741 p(x2=0)= 0.2930
p(x2=4)= 0.6329
when x1=1
p(x2=2)= 0.3413 p(x2=0)= 0.2503 p(x2=4)=
0.4084
----------------------------
x3 Parents:
p(x3=0)= 0.9217 p(x3=1)= 0.0783
----------------------------
y Parents: x2 x3
when x2=0 x3=0
p(y=0)= 0.0390 p(y=1)= 0.9609
when x2=0 x3=1
p(y=0)= 0.1692 p(y=1)= 0.8308
when x2=1 x3=0
p(y=0)= 0.9619 p(y=1)= 0.0381
when x2=1 x3=1
p(y=0)= 0.0111 p(y=1)= 0.9889
################# makemod.bn #################
Fig. 12.4
12.3 Options
The process of specifying a model can be described by the flow chart in Fig. 12.5. Answering a question one way determines the questions that follow.
Fig. 12.5
If the model is to be linear, it is given parameter values in two stages. First specify the distribution on the error variables in the system, and then specify the linear coefficients that will allow the Monte Carlo generator to propagate values from the exogenous variables through the causal system. If the causal structure is to be interpreted as a Bayesian network, then the distribution is given in factorized form. In either case the user can parameterize the model from a file, or have TETRAD II parameterize it randomly.
12.3.1 Parameterizing a Linear Model
In the linear case we need to specify the distribution on the error terms and the values of the linear coefficients.
Specifying The Distribution on the Error Terms
The distribution over the error terms is restricted in two ways. First, the error terms are independent. If you wish two error terms to be correlated, say e1 and e2, then you must explicitly introduce a unique latent common cause of the two variables for which e1 and e2 are associated. Second, you may choose between the normal and uniform distributions. For each of these distributions, two parameters suffice. In the normal case TETRAD II requires that you specify a mean and variance, and in the uniform case it requires that you specify lower and upper bounds. Makemodel prompts for these error distributions interactively, but you may also go into the model file, which is just a text file, and edit them yourself. Session 12.3 shows how makemod.g might be turned into a linear model with a variety of error distributions.
Session 12.3: Fixing the marginal error distributions
******************************************
Create
linear structural equation model (l),
or
Bayesian network (b)? (l/b) [l] <CR>
Do
you want all error terms to have the same distribution? [y] n
Getting
distribution for each variable.
Distribution
on the error term for x1
Uniform = 1
Normal = 2
Distribution: [2]: 1
The
lower bound of the uniform interval:
[0]: <CR>
The
upper bound: [1]: <CR>
Distribution
on the error term for x2
Uniform = 1
Normal = 2
Distribution: [2]: <CR>
Mean: [0]: <CR>
Variance: [1]: <CR>
Distribution
on the error term for y
Uniform = 1
Normal = 2
Distribution: [2]: 1
The
lower bound of the uniform interval:
[0]: -5
The
upper bound: [1]: 5
Distribution
on the error term for x3
Uniform = 1
Normal = 2
Distribution: [2]: <CR>
Mean: [0]: <CR>
Variance: [1]: <CR>
Lower
bound for absolute value of parameters:
[ 0.5000 ]: <CR>
Upper
bound for absolute value of parameters:
[ 1.5000 ]: <CR>
Approximate
percentage of pseudo-randomly selected
parameters
made negative: [ 0.0000 ]: <CR>
The
file containing this model is named: makemod2.lm
You
can read the file into TETRAD with the INPUT command.
>exit
*******************************************
The file makemod2.lm records these error distributions and is readable as an input file to TETRAD II.
#################### makemod2.lm #####################
/graph
x1 x2
1.1038
x2 y
0.6079
x3 y
0.5519
/linearmodel
Variable Dist. Type Parameters
x1
Uniform 0.0000 1.0000
x2
Normal 0.0000 4.0000
y
Uniform -5.0000 5.0000
x3 Normal 0.0000 1.0000
#################### makemod2.lm #####################
Specifying the Linear Coefficients
The user can set the linear coefficients or let TETRAD choose them randomly from some interval with uniform probability. If you choose to set the linear coefficients you must do so in the /Graph section of an input file. The coefficient should be written on the end of the line that specifies the edge, for example,
/graph
x1 x2 0.256
x2 x3 0.560
If an edge occurs in the
graph file input with no coefficient, then Makemodel will choose a coefficient
randomly from a uniformly distributed interval specified by the user. Because
strictly positive linear coefficients may be unrealistic, you may decide the
proportion of coefficients that are made negative. Makemodel first generates a
coefficient, and then pseudo-randomly decides whether to make it negative,
depending on the proportion of negative coefficients you requested.
12.3.2 Parameterizing a Bayesian Network
In session 12.2 we had TETRAD randomly parameterize the Bayesian network. If we had chosen to not let TETRAD randomly parameterize the Bayesian network TETRAD would have created a template file such as the one in Fig. 12.6. The file that parameterizes a Bayesian network can be quite long and has relatively complicated syntactic constraints. This is why TETRAD writes a default file that you can then edit. When editing this file simply check that the sum of the probabilities for any given row sum to 1. Session 12.4 follows such a procedure.
Session 12.4
*****************************************
Create
linear structural equation model (l),
or
Bayesian network (b)? (l/b) [l] b
Let
TETRAD parameterize the Bayesian Network randomly [y] ? n
1)
x1
2)
x2
3)
y
4)
x3
All
variables default as binary valued (0,1).
Enter
the numbers of any you wish to change.
Numbers
= <CR>
TETRAD
will write a file that contains
a
TETRAD readable nework with a default
distribution.
You
should edit the parameters of the distribution
and
save the file. Then you can read the file with
the
INPUT command.
The
file containing the template is
named:
makemod2.bn
>exit
***************************************************
The default network Makemodel creates for editing is uniform, for example, Fig. 12.6:
############ makemod2.bn ##############
/BAYESNETWORK
Number of Values of
Variable
Categories Categories
x1
2 0 1
x2
2 0 1
x3
2 0 1
y
2 0 1
The
Probability Distribution
----------------------------
x1 Parents:
p(x1=0)= 0.5000 p(x1=1)= 0.5000
----------------------------
x2 Parents: x1
when x1=0
p(x2=0)= 0.5000 p(x2=1)= 0.5000
when x1=1
p(x2=0)= 0.5000 p(x2=1)= 0.5000
----------------------------
x3 Parents:
p(x3=0)= 0.5000 p(x3=1)= 0.5000
----------------------------
y Parents: x2 x3
when x2=0 x3=0
p(y=0)= 0.5000 p(y=1)= 0.5000
when x2=0 x3=1
p(y=0)= 0.5000 p(y=1)= 0.5000
when x2=1 x3=0
p(y=0)= 0.5000 p(y=1)= 0.5000
when x2=1 x3=1
p(y=0)= 0.5000 p(y=1)= 0.5000
############ makemod2.bn ##############
Fig. 12.6: The Default Template File Before Editing