Search

 

 

 

11.1 When to Use the Search Command

 

If you have a linear structural equation model that you think plausible, but when the coefficients are estimated it fails a statistical test, or if you think the model may be incomplete, the Search command can be used to suggest sensible modifications to the initial causal model. The Search command can also be used to explore alternatives to a particular causal relation or set of causal relations by starting with a model in which they are omitted. The Search command suggests adding edges to the initial causal model (or in other words, freeing parameters that were fixed in the initial unestimated structural equation model.) When the modified causal models suggested by TETRAD II are used as input to a testing and estimation package such as  LISREL,  CALIS, or  EQS, the suggested models will often fit the data better than the initial model.

The  Search command should only be used in the following circumstances:

 

1. The total number of latent and measured variables occurring in the data and initial causal graph is no greater than 23.

 

2. You believe the correct model is approximately linear normal.

 

3. Each latent variable has at least two indicators.

 

4. You have enough background information to suggest an initial causal model (or set of initial causal models) with latent variables but not enough to be confident that the model is complete.

 

5. You believe that the causal relations among the measured variables, or between latent and measured variables, are of interest in their own right (otherwise you should use the MIMbuild command described in chapter 10).

 

If you suspect that latent variables are not present in the correct model, the data should first be run through the Build command. If the Build command suggests a model without latent variables that fails to pass a statistical test under the assumption of linearity, or the Build command suggests that latent variables are present, the data and a model can then be run through the Search command.

 

11.2 The Input and Output of the Search Command

The input to the Search command consists of three kinds of information, the first two of which are required:

 

1. A /Graph section.

 

2. A /Covariance section or a /Continousraw section.

 

3. Background knowledge that guides the search (see section 11.4).

 

The output is a list of suggested models.  Each of the models contains the initial graph, but also may contain additional edges that are compatible with the background knowledge. Often, the fit of the estimated modified models is better than the fit of the estimated initial model. However, due to nonlinearities, nonnormality, or errors in the specification of the initial model, none of the models may perform well on a statistical test. We suggest that the models suggested by the Search command be used as input to a statistical estimation and testing package such as LISREL, CALIS, or EQS. TETRAD II contains a facility for automatically translating a causal model into input for EQS, LISREL, or CALIS that is described in chapter 14.[1]

 

11.3 A Simple Example 

 

The following session illustrates the simplest way to use the Search command. Suppose you have a battery of five questions (q1-q5) that are intended to measure some latent psychological trait such as the Authoritarian Personality. Each question is a statement such as "We should trust our leaders to always do the right thing"; each answer is a number from 1 to 5, 1 representing strongly disagree and 5 representing strongly agree. (This is similar to the questions asked in Kohn, 1969). You are fairly sure that the answers are effects of the latent personality trait Authoritarian Personality (A). You would like to know, however, whether there are other common causes of some of the answers to the questions, or whether giving an answer to one question somehow affects the answers given to subsequent questions (perhaps by setting a mood, etc.) The input file search1.in, shown in Fig. 11.1, contains covariances for q1 through q5  that were generated pseudo-randomly from a causal graph that contains the graph described in search1.in as a subgraph.

 .in;

#########   search1.in   ##########

/Covariance

 2000

q1 q2 q3 q4 q5

 1.53576

 2.69240  8.76826

 0.40447  1.24150  1.31592

 0.68891  1.92956  0.55777  1.80572

 0.85872  2.39568  0.63114  1.01730  2.16694

 

/graph

A q1

A q2

A q3

A q4

A q5

 

#########   search1.in   ##########

Fig. 11.1

 

This particular input file contains two parts:  a covariance matrix and a graph. The graph is an initial model that contains already known causal connections. The Search command suggests additional edges that can be added to this initial model. It never suggests removing any edges, so if you are not sure that some of the causal connections in the initial model actually exist, you should run the Search command several different times, varying the initial models input to cover the range of plausible initial models.

The following is the transcript of an actual session.

 

Session 11.1:  Using the search command

 

***************************************************

For help, type "help"

Initializing Data Structures

>input

Input File: search1.in

Converting covariance matrix to correlation matrix.

 

>search

Output file: search1.out

 

After a preliminary search to find the most promising edges to add to the initial model, Search evaluates a sequence of models that elaborate the initial model with various combinations of those edges.

 

Adding edge:

      q2 -> q1

      q1 C  q2

      q1 -> q2

>exit

***************************************************

 

The relevant parts of the output file (search1.out) are shown in Fig. 11.2:

 

###############   search1.out   ############.out;

Output file: search1.out

Graph file: search1.in

Data file: search1.in

Graph: 

    A -> q1   A -> q2   A -> q3   A -> q4   A -> q5  

   

Parameters:

 

  Sample Size: 2000

  Continuous Data

  Significance:      0.0500

  Weight:            0.1000

  Width:             0.9500

  Depth:            Unbounded

 

  Acyclic:          YES

  Lm:               YES

  Mm:               YES

  Ml:               YES

  Ll:               YES

  Singleconnection: YES

  Common:           YES

  Settime:          Unbounded

}

Suggested Elaborations to Initial Model

 

q2 -> q1

Number of edges added: 1

Tetrad-score :  97.79 

 

q1 C q2

Number of edges added: 1

Tetrad-score :  97.79 

 

q1 -> q2

Number of edges added: 1

Tetrad-score :  97.79 

###############   search1.out   ############

Fig. 11.2

 

The first section simply repeats the contents of the input file and the values of all of the parameters relevant to controlling the search. (In this case, all such parameters assumed their default values. The meaning of these parameters is described in the following sections.)  The initial graph is printed under the heading of "/Graph." In this case the initial model is

 

A ® q1   A ® q2   A ® q3    A ® q4    A ® q5

 

The output of Search is a set of lists of suggested additions to the initial model. The first list of suggested additions to the initial model suggested by TETRAD II  consists of the edge q1 ® q2. This represents the initial model + q1 ® q2, or the model:

 

A ® q1     A ® q2     A ® q3     A ® q4     A ® q5       q1 ® q2

 

The  Tetrad-score (explained in chapter 8, section 2) for this model is 97.79. A model with a Tetrad-score of 100 would imply every constraint that passes the program's statistical tests and none that fail. The score is mainly useful as a heuristic means of comparing models.

In this case three models are suggested. The first model suggested is the initial graph + q1 ® q2, second is the initial graph +  q2 ® q1, and the third is the initial causal model + q1 C q2, where  q1 C q2 means that the error terms for q1 and q2 are correlated, or equivalently that there is an additional latent common cause of q1 and q2 (Fig. 11.3).

 

Fig. 11.3: Models Representing the q1 C q2 Suggestion

 

Although in this case q1 ® q2, q1 C q2, and q2 ® q1 were all equally good modifications to the initial model, it is sometimes the case that one of these kinds of modifications is better than the other two. Search may also suggest adding more than one edge or correlated error to the initial model.

Note that TETRAD II suggests a number of different causal models that are extensions of the initial model. Other search techniques (such as those in LISREL or EQS) suggest only a single extension of the initial causal model, often chosen at random from among modifications that are statistically indistinguishable. We believe that when the data and the background knowledge are unable to find a single best modification to the initial model, but instead find a set of alternative best modifications, it is more appropriate and more informative to present to the user the entire set of best alternative modifications.

Search suggests only those models whose Tetrad-score is close to the model with the highest Tetrad-score.

 

11.4 The Difficulty of Search

Searching for the best set of edges to add to the initial causal model is difficult because it is not monotonic: It might be that adding edge E to the initial model does not much improve the capacity of the model to explain the data in comparison with adding other individual edges, and neither does adding edge F. But adding both E and F does greatly improve the explanatory power and fit of the model in comparison with any other pair of edges. In addition it is often the case that many single-edge additions to the initial model improve the explanatory power and fit to exactly the same extent. Thus a reliable search cannot simply find the best single edge addition E1 to the initial model, add it to the initial model, find the best single edge addition E2 to the initial model + E1, add it to the initial model + E1, and so on. (Searches of this kind are carried out by LISREL and EQS.). This means that very large numbers of sets of edges that might be added to the initial causal model must be examined by the search.

Unfortunately, this implies that the time taken to complete a search grows exponentially with the number of variables. The program uses several techniques to cut off unpromising branches of search, but even so a complete search among 23 variables is impossible for most data sets. Hence, for large numbers of variables, it is important to provide background knowledge to direct the search. For example, if a variable q1 is known to occur before variable q2, then the user can direct the Search command to ignore any models that contain an edge from q2 to q1. The advantage of using background knowledge of this kind is that it speeds up the search; the disadvantage is that if the knowledge is incorrect, you will prevent the program from finding the correct answer.

For searches of more than a few minutes, the Search command periodically prints out estimates of how long the search will take to complete. These should be taken as no more than ballpark estimates. If the estimate is longer than you can afford to run the search, you can interrupt the search on the UNIX version of the program by pressing the control key and \ at the same time, and on the PC version by pressing the control key and the letter "g" at the same time. Then enter additional restrictions on the models to be searched, and try again. We have provided users with a number of ways of placing restrictions on the search.

 

11.5 The Settime Command

;

The Settime command sets the maximum amount of time that the Search command will take. Session 11.2 shows how the Search command is used in combination with the Settime command to set an upper limit of 3 minutes to the time searched. The data in search2.dat (which we do not show here, but is on examples disk that comes with the program) were generated from the initial model in search2.dat plus edges from x1 to x8 and x5 to x9.

 

Session 11.2:  Using the settime command

 

*********************************************************************

>input

Input File: search2.dat

Converting covariance matrix to correlation matrix.

 

>settime

Settime (minutes)   [Unlimited]: 3

 

>search

Output file: search2.out

 

 

##

Adding edge:

      x5 -> x9

         x1 -> x8

         T1 -> x8

            x1 C  x8

         x8 C  T1

            x1 C  x8

         x1 C  x8

            x8 C  x9

               x2 C  x9

               x2 -> x9

            x9 -> x8

               x5 C  x8

 

Changing depth to: 5

 

            x5 -> x8

 

Changing depth to: 4

 

               T1 -> x9

               x9 C  T1

            x4 -> x8

The expected time at depth 1 is:  0.2     minutes

The expected time at depth 2 is:  0.5     minutes

The expected time at depth 3 is:  1.6     minutes

The expected time at depth 4 is:  2.9     minutes

 

Changing depth to: 3

 

            x2 C  x9

            x2 -> x9

         x4 -> x8

 

Changing depth to: 2

 

         x9 -> x8

         x5 -> x8

         x3 -> x8

      x1 -> x8

         T1 -> x9

 .

 .

**********************************************************

 

When a maximum time is set by the user, the maximum number of edges that the search considers adding to the initial model (the depth) is adjusted automatically so that the estimated time for completion of the search is equal to the allotted amount of time. In Session 11.2 the program changes the maximum number of additional edges (i.e., the depth) to 4; subsequently, when that did not speed the search up enough, it reset the depth to 2. In this case, the model that generated the data had only two more edges than the initial graph, so that even though the search was not finished in normal fashion it still succeeded (search2.out, in Fig. 11.4). If the model that generated the data had more than two additional edges, however, it might have failed to find the correct elaboration of the initial model. To reset the search time to be unlimited, enter a value of -1.

 

###############   search2.out   ##############.out;

Settime:           3.0000  minutes.

 

Search aborted because time limits exceeded.

Search aborted.

Suggested Elaborations to Initial Model

 

x1 -> x8

x5 -> x9

Number of edges added: 2

Tetrad-score :  96.51 

 

###############   search2.out   ##############

Fig. 11.4

 

11.6 Background Knowledge

Table 11.1 names the various parameters that control the search, their default values, the range of values they can take, and a brief explanation of their effects. A more complete explanation of the effect each parameter has on the search is given below. Each of these commands can also be used interactively. The interactive use of these commands is explained in chapter 4.

 

Name

Default

Range

Explanation

Acyclic

Yes

Yes,No

No cyclic directed paths

Addtemporal

Empty

List of vertices

Temporal order of vertices

Depth

Unlimited

-1..100

Maximum number of edges to be added

Forbidcommon

Empty

List of edges

Eliminate specified common causes

Forbiddirect

Empty

List of edges

Eliminate specified edges

Common

Yes

Yes,No

Allow common causes

Ll

Yes

Yes,No

Allow latent-latent edges

Lm

Yes

Yes,No

Allow latent-measured edges

Ml

Yes

Yes,No

Allow measured-latent edges

Mm

Yes

Yes,No

Allow measured-measured edges

Settime

Unlimited

-1..1000

Maximum amount of time for search

Singleconnection

Yes

Yes,No

Not both common cause and direct effect

Width

.95

0..1.0

Affects how much an edge has to

improve score to be considered in Search

Table 11.1: Search Parameters

 

11.6.1 Temporal Information

 

The Addtemporal and Removetemporal commands are used to store temporal information about the variables. Suppose x67 and y67 were measured in 1967, x72 and y72 were measured in 1972,  x84 was measured in 1984, and the temporal relationship of z1 to the other variables is not known. No model that suggests an edge from a later variable to an earlier variable should be allowed. These models can be eliminated from consideration by the Search command in the following way using the Addtemporal command:

 

/Knowledge

addtemporal

1 x67 y67

3 x84

2 x72 y72

 

Fig. 11.5

 

The syntax of the Addtemporal and Removetemporal commands are explained in Chapter 4.

 

11.6.2 Forbiddirect, Forbidcommon, Allowdirect, and Allowcommon

Each of these commands is followed by a list of lines containing information about the edges that are required or forbidden. The list  of lines must be followed by a blank line to signal that the command has ended.

 

The Forbiddirect command is used to specify edges that will be eliminated from the search conducted by the Search command. For example, suppose the initial graph is:

 

/Graph

T1 x1

T1 x2

T2 x3

x2 x4

x5 T2

T1 T2

Fig. 11.6

 

If background knowledge indicates that x1 cannot cause x2, and x2 cannot cause x3, these restrictions can be entered in the following way:

 

/Knowledge

Forbiddirect

x1 x2

x2 x3

Fig. 11.7

 

The first line of the command states what sort of causal connection is being forbidden (in this case, a direct edge). The syntax of these commands is explained in chapter 4.

The Forbidcommon command acts in the same way as the Forbiddirect command except that it instructs the search command not to consider a latent common cause between two variables. Similarly, the Allowcommon command undoes the effect of a Forbidcommon command.

 

11.6.3 Acyclic

When Acyclic is set to yes, the search will not consider any combination of edges that creates a cyclic directed path, that is a path that begins and ends with the same vertex. The calculation of which tetrad equations are implied to vanish for any parameterization of a model is guaranteed to be correct only for acyclic models. The following example illustrates how to use the Acyclic setting. Suppose the initial graph is from the input file shown in Fig. 11.8:

 

/graph

T x1

T x2

T x3

T x4

Fig. 11.8

 

If Acyclic is Yes, then the combination of edges x1 ® x2, x2 ® x3, x3 ® x1 would not be added to the initial model, because that would create a directed path that begins and ends with x1. However, none of the individual edges x1 ® x2, x2 ® x3, or x3 ® x1 would be eliminated from consideration; it is only the combination of edges that would be ruled out.

The default value of Acyclic is Yes, that is, the search is restricted to adding edges that do not create a cyclic directed path. Fig. 11.9 shows how to set Acyclic to no:

 

/Knowledge

Acyclic  no

Fig. 11.9

 

The words Yes and No that are values for Acyclic can be abbreviated by y and n respectively.

 

11.6.4 Ll, Lm, Ml, Mm

 

There are two kinds of variables in an initial graph: latent variables whose names begin with an uppercase letter, and measured variables whose names begin with a lower case letter. For example, T1 and T2 represent latent variables, and x1, x2, x3, x4, and x5 represent measured variables. Thus there are four possible kinds of edges in a graph: ll (latent-latent) edges from latent variables to latent variables (for example T1 ® T2), lm (latent-measured) edges from latent variables to measured variables (for example T1 ® x1), ml (measured-latent) edges from measured variables to latent variables (for example x5 ® T2), and mm (measured-measured) edges from measured variables to measured variables (for example x2  ® x4). In this example, it is obvious that the measured variables, which are answers to questions, do not cause the latent variables, which are psychological traits. This information can be entered into TETRAD II by setting ml (which abbreviates no measured-latent) to No, as in Fig. 11.10:

 

/Knowledge

ml  No

Fig. 11.10

 

Similarly, to forbid the search from considering latent-latent edges set ll to No, to forbid the search from considering latent-measured edges set lm to No, and to forbid the search from considering measured-measured edges set mm to No. All of these parameters can be set from within the TETRAD II program or as part of the input file.

 

11.6.5 Singleconnection

There are three kinds of causal connections between any pair of variables x and y: x ® y, y ® x, and x C y. If you wish to consider only models in which at most of one those causal connections occur, then set Singleconnection to Yes. If Singleconnection is Yes then for any pair of variables x and y the search will consider models with x ® y, y ® x, and x C y, but it will not consider any model with x ® y and x C y, y ® x and x C y, or x ® y and y ® x. The default value of singleconnection is Yes. The knowledge file in Fig. 11.11 shows how to reset it to No.

 

/Knowledge

singleconnection  No

Fig. 11.11

 

11.6.6 Common

If common is set to No, then the search will not consider any models that contain common causes.

 

/Knowledge

common  No

Fig. 11.12

 

11.6.7 Depth

If background knowledge has still failed to speed the search up enough to make it feasible, one simple way to speed it up more is to simply set an upper limit to the number of edges that the search will add to the initial graph. This is done by setting the Depth parameter. The default value of the Depth parameter is unlimited, so the search will continue adding edges to the initial model until adding edges no longer improves the TETRAD-score. But by setting Depth to four, for example, the search will consider adding at most 4 edges or common causes to the initial model.

To set the value of the Depth parameter simply place the desired integer value for the parameter next to the name of the parameter, separated by any nonzero number of spaces or tabs. (To reset the depth parameter to be unlimited, set it to -1.)  If a parameter is given a real value instead of an integer value, the real number will be rounded off to the nearest integer. The following example shows how to set the value of the Depth parameter in an input file:

 

/Knowledge

Depth   2

Fig. 11.13

 

 

11.6.8 Width

If in the course of the search, TETRAD II has added a set of edges E to the initial model, the search will then go through the list of remaining edges that can be added to the initial model + E, and rank then in order of how much they increase the Tetrad-score of the initial model + E. If an edge F diminishes the Tetrad-score, or it fails to increase the Tetrad-score very much in comparison with the edge that improves the Tetrad-score the most, the program will eliminate from consideration initial model + E + F, and any extensions of that model. The width parameter controls how poorly a given edge has to do in improving the Tetrad-score in comparison to the best edge, before it is eliminated from consideration. The default value of the width parameter is 0.95. This means that if the best single-edge extension of initial model + E has a Tetrad-score of X, then any single edge extension of the initial model + E that fails to have a Tetrad-score greater than 0.95 * X is eliminated from consideration. The larger the width parameter, the faster the search.

 

/Knowledge

Width .90

Fig. 11.14

 

11.6.9 Significance Level

 

The effect of increasing the significance level is that TETRAD II tends to decrease the number of edges that it suggests adding to the initial model. We have not experimented extensively with different values of the significance level, and in general we leave it at 0.05. Fig. 11.15 shows how to reset the significance level.

 

/Knowledge

sig .01

Fig. 11.15

 

11.7 Reliability of the Search Procedure

 

The search space of models is generally so large that it cannot be completely searched in a reasonable amount of time, and so Search uses heuristics to search only among the most promising models. Hence the output is not guaranteed to be correct. The procedures use tests for vanishing tetrad differences that assume joint normality. The following simulation study is described briefly here and in more detail in Spirtes, Scheines, and Glymour (1990), and Spirtes, Glymour, and Scheines (1993).

For each of the following nine different recursive linear structural equation models we generated 20 samples of size 200 and 20 samples of size 2,000.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

For each model, TETRAD II, LISREL and EQS were each given as initial models the corresponding model shown in the Fig.s, but without the edges in boldface. They were each asked to find the missing edges. (The version of TETRAD II used in the study differed in a number of insignificant ways from the current version.) LISREL and EQS search for missing edges using searches based on slightly different forms of modification indices (see .;Bentler, 1989; .;Joreskog & .;Sorbom 1993b) We scored the results of each program in the following way. For each data set and initial model, TETRAD II produces a set of best alternative elaborations. In some cases that set consists of a single model; typically it consists of two or three alternatives. EQS and LISREL VI, when run in their automatic search mode, produce as output a single model elaborating the initial model.[2]  The information provided by each program is scored "correct" when the output contains the true model. But it is important to see how the various programs err when their output is not correct, and we have provided a more detailed classification of various kinds of error. We have classified the output of TETRAD II as follows (where a model is in TETRAD II's top group if and only if it is tied for the highest Tetrad score, and no model with the same Tetrad-score has fewer edges):

 

 

Correct  - the true model is in TETRAD II's top group.

Width - the average number of alternatives in TETRAD II's top group.

 

Errors:

Overfit - the TETRAD II top group does not contain the true model but contains a model that is an elaboration of the true model.

Underfit - the TETRAD II top group does not contain the true model but does contain a model of which the true model is an elaboration.

Other - none of the previous categories apply to the output.

 

We have scored the output of the LISREL VI and EQS programs as follows:

 

Correct - the true model is recommended by the program.

 

Errors:

In the TETRAD II Top Group - the recommended model is not correct, but is among the best alternatives suggested by the TETRAD II program for the same data.

Overfit - the recommended model is an elaboration of the true model.

Underfit - the true model is an elaboration of the recommended model.

Right Variable Pairs - the recommended model is not in any of the previous categories, but it does connect the same pairs of variables as were connected in the omitted parts of the true model.

Other - none of the previous categories apply to the output.

 

Width, n=2000

Case              1           2            3           4            5           6            7           8            9

LISREL VI    1           1            1           1            1           1            1           1            1

EQS               1           1            1           1            1           1            1           1            1

TETRAD       4           2.1         2           1            1.1        3            7.1      11.3         2.9

 

Width, n=200

Case              1           2            3           4            5           6            7           8            9

LISREL VI    1           1            1           1            1           1            1           1            1

EQS               1           1            1           1            1           1            1           1            1

TETRAD       1.9        3.5         1.5        1            1           3.2         5.9        8.4         3

 

For a sample size of 2,000, TETRAD II's set included the correct respecification in 95% of the cases. LISREL VI found the right model 18.8% of the time and EQS, 13.3%. For a sample size of 200, TETRAD II's set included the correct respecification 52.2% of the time, whereas LISREL VI corrected the misspecification 15.0% of the time, and EQS corrected the misspecification 10.0 % of the time.

 

Fig. 11.16

 

 A more detailed characterization of the errors is given in Fig. 11.17. We also found that when the answers produced by EQS or LISREL agreed with TETRAD II's answer, both were more reliable than when they disagreed.

 

Fig. 11.17

 

The TETRAD II procedure cannot find the correct model if there are a large number of vanishing tetrad differences that are not linearly implied by the true model, but hold because of coincidental values of the free parameters. Our study indicates that this occurrence is unusual, at least given the uniform distribution that we placed on the linear coefficients in the models that generated our data, but it certainly does occur. The same results can be expected for any other "natural" distribution on the parameters. Further, the search does not guarantee that it will find all of the models that have the highest Tetrad-score. But in many cases, depending on the size of the model, the amount of background knowledge, the structure of the model, and the sample size, the search space is so large that a search that guarantees finding the models with the highest Tetrad-score is not practical. One way the procedure limits search is through the application of a simplicity principle, namely that models with fewer edges are to be preferred over models with more edges and the same Tetrad-score. This is a substantive assumption that may be false. The simplicity assumption is not needed for some small models, but in many problems with more variables there may be a large number of models that have maximal scores but contain many redundant edges that do not contribute to the score. Without the use of the simplicity assumption, it is often difficult to search this space of models and if it is searched, there may be so many models tied for the highest score that the output is uninformative. If a model with "redundant" R-check thisedges is correct, then our procedure will not find it. Typically these structures are underidentified, and so they could not be found by either LISREL VI or EQS.

Finally, there exist many latent variable models that cannot be distinguished by the vanishing tetrad differences they imply, but are nonetheless in principle statistically distinguishable. The LISREL or EQS procedures might succeed in discovering such structures when the TETRAD II procedures fail.

 

11.8 An Empirical Example

Bollen (1980) .;studied whether a number of measures of political democracy were indicators of a common feature of societies. Using measures of press freedom (pf), freedom of group opposition (fg), government sanctions (gs), fairness of elections (fe), executive selection (es), and legislature selection (ls), he considered the linear factor model in Fig. 11.18, where it is understood that for each of the measured variables there is an error term.

 

 

Fig. 11.18: Bollen's Initial Model

 

Using LISREL, Bollen then estimated the model, and considered variants in which other factors confound the measured variables. The best model he found is:

 

Fig. 11.19: Bollen's Respecified Model

 

The additional arrows indicate correlations among the error terms produced by unmeasured common causes. The model easily passes the EQS likelihood ratio test. We ran Search on Bollen's data and initial unidimensional factor model in Session 11.3.

 

Session 11.3: Running search on Bollen's model

 

******************************

>input

Input File: bollen.dat

 

>search

Output file: bollen2.out

 

Adding edge:

      fg C  gs

         ls -> fg

            ls -> es

         fg C  ls

            es C  ls

            es -> ls

      gs -> fg

         ls -> fg

         fg C  ls

            es C  ls

            es -> ls

      fg -> gs

         pf -> ls

         ls -> pf

         pf C  ls

>exit

************************

 

The output (bollen2.out) in fact contains several models with the same Tetrad-score, the first of which is exactly Bollen's best respecified model.

 

##############   bollen2.out   #########.out;

Suggested Elaborations to Initial Model

fg C gs

fg C ls

es C ls

Number of edges added: 3

Tetrad-score :  92.93 

 

fg C gs

fg C ls

es -> ls

Number of edges added: 3

Tetrad-score :  92.93 

 

gs -> fg

fg C ls

es C ls

Number of edges added: 3

Tetrad-score :  92.93 

 

gs -> fg

fg C ls

es -> ls

Number of edges added: 3

Tetrad-score :  92.93 

##############   bollen2.out   #########

 



[1] At very large sample sizes, even a causal model that is very close to the true one can fail these tests due to slight departures from linearity or other statistical assumptions. There is a selection bias in testing a model on the same data it was generated from. For these reasons, we suggest interpreting the probability of the c2 of a model as a measure of fit.

[2]Very similar results were obtained with LISREL VII.