Tetrads

8.1 The Tetrads Command

Chapters 9, 10, and 11 describe tools for specifying linear structural equation models with latent variables. Chapter 9 describes Purify, which helps to locate unidimensional measurement models, and chapter 10 describes MIMbuild, which helps to specify the structural equations among latent variables that have a unidimensional measurement model. Purify can be used as a pre-processor to MIMbuild, or on its own. Chapter 11 describes Search, which helps to respecify a given model by adding edges, corresponding to freeing parameters previously fixed at 0 in a structural equation model. Search thus performs a function similar to the Modification Indices of LISREL and EQS, although it uses very different techniques.

There are three possible tetrad differences over random variables {i,j,k,l}:

t_ijkl= r_i,j r_k,l- r_i,kr_j,l

t_ijlk= r_i,j r_k,l- r_i,lr_j,k

t_iklj= r_i,k r_j,l- r_i,lr_j,k

The value of any two of tetrad differences among {i,j,k,l} determines the value of the third.

Tetrad differences are used by the Purify, MIMbuild, and Search modules, but information about exactly which set of tetrads are implied by a particular causal graph and which tetrads vanish statistically are not given by these modules. The information is available, however, with the Tetrads command.

If T is the set of latent variables and V the set of measured variables, the Tetrads command requires a causal graph G over T È V, and covariance or raw data over V. To calculate the set of tetrads that vanish in any parameterization of G, Tetrads uses an algorithm based on the Tetrad Representation Theorem (Spirtes, 1989; Spirtes, Glymour, & Scheines, 1993). To calculate the set of tetrad differences that vanish statistically, Tetrads performs a statistical hypothesis test of each tetrad t_ijkl with the null h_o: t_ijkl = 0, and uses the Wishart .;(1928) statistical test based on the assumption that V is distributed as multivariate normal (see Appendix A for details).

For example, using the Monte Carlo generator, we generated simulated data (n = 2000) from the model in Fig. 8.1 (the file tetrads.g), where each variable has independent error ~N(0,1).

Fig. 8.1

The data are stored as a correlation matrix in the file tetrads.dat (Fig. 8.2), which can then be used as one of the input files to the Tetrads command.

.dat;

################### tetrads.dat #######################

{

The Generating Model

Linear Structural Equation Model

T1 = e6

T2 = 0.671T1 + e7

x1 = 0.766T1 + e1

x2 = 0.678T1 + e2

x3 = 1.438T1 + e3

x4 = 1.432T2 + e4

x5 = 1.114T2 + e5

}

/Covariance

2000

x1 x2 x3 x4 x5

1.0000

0.3649 1.0000

0.5148 0.4662 1.0000

0.2982 0.2710 0.3772 1.0000

0.2759 0.2489 0.3606 0.6949 1.0000

################### tetrads.dat #######################

Fig. 8.2

Session 8.1 demonstrates how to use the tetrads command.

Session 8.1: The tetrads command

********************************************************************

>input

Input file: tetrads.dat

>input

Input file: [tetrads.dat] tetrads.g

>tetrads

Output file: tetrads.out

>exit

*******************************************************************

The output (tetrads.out, shown in the following) prints a line for each tetrad t_ijkl= r_i,j r_k,l- r_i,kr_j,l. The line first lists the variable names that identify the four correlations in the tetrad difference, for example, for t_ijkl the line would begin by printing: i j, k l - i k, j l. Thus the first line of tetrads.out:

x1 x2, x3 x4 - x1 x3, x2 x4

identifies the tetrad difference:

r_x1,x2r_x_3,x4 - r_x1,x3 r_x2,x4

The next piece of information printed is Resid, which is the tetrad difference observed in the sample. For example, on the first line Resid = -.0018, and this is calculated from:

r_x1,x2r_x_3,x4 - r_x1,x3 r_x2,x4,

where r_x1,x2 is the sample correlation between x₁and x₂.

The next piece of information printed is I, which stands for Implied. I has a value of "0" if the tetrad difference equals zero for every parameterization of the causal graph entered. If the tetrad difference t_ijkl= r_i,j r_k,l- r_i,kr_j,l vanishes in every parameterization of G because r_i,j = 0 or r_k,l= 0 and r_i,k = 0or r_j,l = 0 in every parameterization of G, then we say t_ijkl is implied trivially, and in that case Tetrads prints a "T" in the I column.

The last column is titled Prob, and it is the probability that in a sample of the given size that the absolute value of Resid is as large or larger than the observed value on the assumption that the null hypothesis is true. For example, in line 1 of tetrads.out, the probability that the absolute value of

r_x1,x2r_x_3,x4 - r_x1,x3 r_x2,x4

in a sample of size 2000 is greater than or equal to .0018 is .8659, on the assumption that

r_x1,x2r_x_3,x4 - r_x1,x3 r_x2,x4 = 0

The second to last column is titled H, which stands for Holds. H has a value of "0" if Prob > a, where a is the user set significance level, that is, a tetrad equation holds if we cannot reject the null hypothesis that the difference vanishes in the population. If r_i,j or r_k,lare insignificant, and r_i,kor r_j,l are insignificant, then we say t_ijkl holds trivially, and in that case Tetrads prints a "T" in the H column.

################### tetrads.out #######################.out;

Tetrad Equation Resid I H Prob

x1 x2, x3 x4 - x1 x3, x2 x4 -0.0019 0 0 0.8659

x1 x3, x2 x4 - x1 x4, x2 x3 0.0005 0 0 0.9667

x1 x4, x2 x3 - x1 x2, x3 x4 0.0014 0 0 0.8937

x1 x2, x3 x5 - x1 x3, x2 x5 0.0034 0 0 0.7560

x1 x3, x2 x5 - x1 x5, x2 x3 -0.0005 0 0 0.9667

x1 x5, x2 x3 - x1 x2, x3 x5 -0.0029 0 0 0.7742

x1 x2, x4 x5 - x1 x4, x2 x5 0.1793 0.0000

x1 x4, x2 x5 - x1 x5, x2 x4 -0.0005 0 0 0.9197

x1 x5, x2 x4 - x1 x2, x4 x5 -0.1788 0.0000

x1 x3, x4 x5 - x1 x4, x3 x5 0.2502 0.0000

x1 x4, x3 x5 - x1 x5, x3 x4 0.0035 0 0 0.5520

x1 x5, x3 x4 - x1 x3, x4 x5 -0.2537 0.0000

x2 x3, x4 x5 - x2 x4, x3 x5 0.2262 0.0000

x2 x4, x3 x5 - x2 x5, x3 x4 0.0038 0 0 0.5190

x2 x5, x3 x4 - x2 x3, x4 x5 -0.2301 0.0000

################### tetrads.out #######################

Fig. 8.3

8.2 The Tetrad-score

Search, Purify, and MIMbuild all evaluate models using the Tetrad-score. The Tetrad-score is a measure of how closely the vanishing tetrad differences linearly implied by a DAG G match the tetrad differences judged to vanish in the population.

If G is the graph of a linear structural equation model with uncorrelated error terms, then G entails a certain set of vanishing tetrad differences, regardless of the numerical values of the linear coefficients or the distributions of the exogenous variables. The Tetrad-score of a graph relative to a sample is based upon how well the set of vanishing tetrad differences entailed by a given graph matches the set of tetrad differences judged to vanish in the population. If a graph entails that a given tetrad difference T vanishes, and T is judged to vanish in the population then credit is given to the graph; if a graph entails that a given tetrad difference T vanishes, and T is judged not to vanish in the population then that fact is scored against the graph.

Each tetrad difference is judged to be equal to zero in the population if that hypothesis passes a statistical test. Let the associated probability P(T) of a given tetrad difference T be the probability that a sample value of the tetrad difference exceeds the observed value of the tetrad difference, given that the actual tetrad difference in the population is zero. Thus if the significance level is a, a tetrad difference T is judged to be equal to zero in the population if the associated probability of T is greater than or equal to a.

For a given graph G, let Implied(G) be the set of tetrad differences that are entailed to be zero by G. For a given data set, let Held be the set of tetrad differences that are judged to be equal to zero in the population, and ~Held be the set of tetrad differences that are judged not to be equal to zero in the population. Once the associated probabilities of all of the tetrad differences have been calculated, the raw Tetrad-score of a graph can be calculated in the following way:

The raw Tetrad-score of a given model is affected by two parameters: the significance level, which affects the membership of Held and ~Held, and the weight, which determines the relative importance of the graph correctly implying that a tetrad difference is equal to zero versus the graph incorrectly implying that a tetrad difference is equal to zero. The default significance value is 0.05, and the default weight is equal to 0.1. The actual Tetrad-score is formed by re-scaling the raw Tetrad-score so that if there were a model that entailed all and only the vanishing tetrad differences in the population it would get a score of 100, and if there were a model that entailed all and only the tetrad differences that were not 0 in the population it would receive a score of 0.