Package edu.cmu.tetrad.search
Class MarkovCheck
java.lang.Object
edu.cmu.tetrad.search.MarkovCheck
- All Implemented Interfaces:
EffectiveSampleSizeSettable
Checks whether a graph is Markov given a data set. First, a list of m-separation predictions are made for each pair
of variables in the graph given the parents of one of the variables. One list (for Markov) is for where the
m-separation holds and another list (for dependency checks) where the m-separation does not hold. Then the
predictions are tested against the data set using the independence test. For the Markov test, since an independence
test yielding p-values should be Uniform under the null hypothesis, these p-values are tested for Uniformity using
the Kolmogorov-Smirnov test. Also, a fraction of dependent judgments is returned, which should equal the alpha level
of the independence test if the test is Uniform under the null hypothesis. For the Faithfulness test, the p-values
are tested for Uniformity using the Kolmogorov-Smirnov test; these should be dependent. Also, a fraction of dependent
judgments is returned.
Knowledge may be supplied to the Markov check. This knowledge is used to specify independence and conditioning ranges. For facts of the form X _||_ Y | Z, X and Y should be in the last tier of the knowledge, and Z should be in previous tiers. Additional forbidden or required edges are not allowed.
- Version:
- $Id: $Id
- Author:
- josephramsey
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final class
Stores the set of m-separation facts and the set of m-connection facts for a graph, for the global check.static final record
A single record for the results of the Markov check. -
Constructor Summary
ConstructorsConstructorDescriptionMarkovCheck
(Graph graph, IndependenceTest independenceTest, ConditioningSetType setType) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionvoid
addObserver
(ModelObserver observer) Adds a ModelObserver to the list of observers.checkAgainstAndersonDarlingTest
(List<Double> pValues) Tests a list of p-values against the Anderson-Darling Test.Retrieves the list of local independence facts for a given node.void
clear()
Clears the results stored in the `resultsIndep` and `resultsDep` lists.void
generateResults
(boolean clear) Generates all results, for both the Markov and dependency checks, for each node in the graph given the parents of that node.void
generateResults
(boolean indep, boolean clear) Generates results based on the specified independence and clearing conditions.Returns the set of independence facts used in the Markov check, for dseparation and dconnection separately.double
getAndersonDarlingA2
(boolean indep) Returns the Anderson-Darling A^2 statistic for the given list of results.double
getAndersonDarlingA2
(List<IndependenceResult> visiblePairs) Calculates the Anderson-Darling A2 value for a list of independence results.double
getAndersonDarlingA2Star
(boolean indep) Returns the Anderson-Darling A^2* statistic for the given list of results.double
getAndersonDarlingP
(boolean indep) Returns the Anderson-Darling p-value for the given list of results.double
getAndersonDarlingPValue
(List<IndependenceResult> visiblePairs) Calculates the Anderson-Darling p-value for a given list of independence results.getAndersonDarlingTestAcceptsRejectsNodesForAllNodes
(IndependenceTest independenceTest, Graph graph, Double threshold, Double shuffleThreshold) Calculates the Anderson-Darling test and classifies nodes as accepted or rejected based on the given threshold.getAndersonDarlingTestAcceptsRejectsNodesForAllNodesPlotData
(IndependenceTest independenceTest, Graph estimatedCpdag, Graph trueGraph, Double threshold, Double shuffleThreshold, Double lowRecallBound) Get accepts and rejects nodes for all nodes from Anderson-Darling test and generate the plot data for confusion statistics.getAndersonDarlingTestAcceptsRejectsNodesForAllNodesPlotData2
(IndependenceTest independenceTest, Graph estimatedCpdag, Graph trueGraph, Double threshold, Double shuffleThreshold, Double lowRecallBound) Get accepts and rejects nodes for all nodes from Anderson-Darling test and generate the plot data for confusion statistics.double
getBinomialPValue
(boolean indep) Returns the Binomial p-value for the given list of results.double
getBinomialPValue
(List<IndependenceResult> visiblePairs) Calculates the binomial p-value based on the list of visible pairs.Returns the nodes that are possible Z1,...,Zn for X _||_ Y | Z1,...,Zn.double
getFractionDependent
(boolean indep) Returns the fraction of dependent judgments for the given list of results.double
getFractionDependent
(List<IndependenceResult> results) Calculates the fraction of dependent results.Returns the nodes that are possible X and Y for X _||_ Y | Z1,...,Zn.Returns the independence test being used.Returns the knowledge object for the Markov checker.double
getKsPValue
(boolean indep) Returns the Kolmorogov-Smirnov p-value for the given list of results.double
getKsPValue
(List<IndependenceResult> visiblePairs) Calculates the Kolmogorov-Smirnov (KS) p-value for a list of independence test results.getLocalPValues
(IndependenceTest independenceTest, List<IndependenceFact> facts) Calculates the local p-values for a given independence test and a list of independence facts.getLocalPValues
(IndependenceTest independenceTest, List<IndependenceFact> facts, Double shuffleThreshold) Get Local P values with shuffle threshold provided.Generates the results for the given set of independence facts as a single record.Returns the Markov check record as a string.int
getNumTests
(boolean indep) Returns the number of tests for the given list of results.void
getPrecisionAndRecallOnMarkovBlanketGraph
(Node x, Graph estimatedGraph, Graph trueGraph) Calculates the precision and recall on the Markov Blanket graph for a given node.void
getPrecisionAndRecallOnMarkovBlanketGraph2
(Node x, Graph estimatedGraph, Graph trueGraph) Calculates the precision and recall using LocalGraphConfusion (which calculates the combination of Adjacency and ArrowHead) on the Markov Blanket graph for a given node.getPrecisionAndRecallOnMarkovBlanketGraphPlotData
(Node x, Graph estimatedGraph, Graph trueGraph) Calculates the precision and recall on the markov blanket graph plot data.getPrecisionAndRecallOnMarkovBlanketGraphPlotData2
(Node x, Graph estimatedGraph, Graph trueGraph) This method calculates the precision and recall of a target node's Markov Blanket in the given estimated graph.getPValues
(List<IndependenceResult> results) Returns the list of p-values for the given list of results.getResults
(boolean indep) After the generateResults method has been called, this method returns the results for the Markov or dependency check, depending on the value of the indep parameter.Returns type of conditioning sets to use in the Markov check.getVariable
(String name) Returns the variable with the given name.Returns the variables of the independence test.boolean
isMpdag()
Checks whether the given graph is a CPDAG (Completed Partially Directed Acyclic Graph).void
Notifies all registered ModelObservers by invoking their update() method.void
removeObserver
(ModelObserver observer) Removes the specified observer from the list of observers.void
setEffectiveSampleSize
(int sampleSize) Sets the sample size if the sample size of the data or covariance matrix is not the sample size that should be used by the class.void
setFindSmallestSubset
(boolean findSmallestSubset) Sets the flag indicating whether a smallest subset of each conditioning set yielding independence should be reported.void
Sets the independence test to be used for determining independence between variables.void
setKnowledge
(Knowledge knowledge) Sets the knowledge object for the Markov checker.void
setMaxLength
(int maxLength) Sets the maximum path length for relevant paths.void
setParallelized
(boolean parallelized) True if the checks should be parallelized.void
setPercentResample
(double percentResample) Sets the percentage of all samples to use when resampling for each conditional independence test.void
setSetType
(ConditioningSetType setType) Sets the type of conditioning sets to use in the Markov check.
-
Constructor Details
-
MarkovCheck
Constructor. Takes a graph and an independence test over the variables of the graph.- Parameters:
graph
- The graph.independenceTest
- The test over the variables of the graph.setType
- The type of conditioning sets to use in the Markov check.
-
-
Method Details
-
getAllSubsetsIndependenceFacts
Returns the set of independence facts used in the Markov check, for dseparation and dconnection separately.- Returns:
- The set of independence facts used in the Markov check, for dseparation and dconnection separately.
-
checkIndependenceForTargetNode
Retrieves the list of local independence facts for a given node.- Parameters:
x
- The node for which to retrieve the local independence facts.- Returns:
- The list of local independence facts for the given node.
-
getLocalPValues
public List<Double> getLocalPValues(IndependenceTest independenceTest, List<IndependenceFact> facts) Calculates the local p-values for a given independence test and a list of independence facts.- Parameters:
independenceTest
- The independence test used for calculating the p-values.facts
- The list of independence facts.- Returns:
- The list of local p-values.
-
getLocalPValues
public List<List<Double>> getLocalPValues(IndependenceTest independenceTest, List<IndependenceFact> facts, Double shuffleThreshold) Get Local P values with shuffle threshold provided.- Parameters:
independenceTest
- The independence test used for calculating the p-values.facts
- The list of independence facts.shuffleThreshold
- The threshold value for shuffling the data.- Returns:
- The list of local p-values.
-
checkAgainstAndersonDarlingTest
-
getAndersonDarlingTestAcceptsRejectsNodesForAllNodes
public List<List<Node>> getAndersonDarlingTestAcceptsRejectsNodesForAllNodes(IndependenceTest independenceTest, Graph graph, Double threshold, Double shuffleThreshold) Calculates the Anderson-Darling test and classifies nodes as accepted or rejected based on the given threshold.- Parameters:
independenceTest
- The independence test to be used for calculating p-values.graph
- The graph containing the nodes for testing.threshold
- The threshold value for classifying nodes.shuffleThreshold
- The threshold value for shuffling the data.- Returns:
- A list containing two lists: the first list contains the accepted nodes and the second list contains the rejected nodes.
-
getAndersonDarlingTestAcceptsRejectsNodesForAllNodesPlotData
public List<List<Node>> getAndersonDarlingTestAcceptsRejectsNodesForAllNodesPlotData(IndependenceTest independenceTest, Graph estimatedCpdag, Graph trueGraph, Double threshold, Double shuffleThreshold, Double lowRecallBound) Get accepts and rejects nodes for all nodes from Anderson-Darling test and generate the plot data for confusion statistics.Confusion statistics were calculated using Adjacency (AdjacencyPrecision, AdjacencyRecall) and Arrowhead (ArrowheadPrecision, ArrowheadRecall)
- Parameters:
independenceTest
- The independence test to be used for calculating p-values.estimatedCpdag
- The estimated CPDAG.trueGraph
- The true graph.threshold
- The threshold value for classifying nodes.shuffleThreshold
- The threshold value for shuffling the data.lowRecallBound
- The bound value for recording low recall.- Returns:
- A list containing two lists: the first list contains the accepted nodes and the second list contains the
-
getAndersonDarlingTestAcceptsRejectsNodesForAllNodesPlotData2
public List<List<Node>> getAndersonDarlingTestAcceptsRejectsNodesForAllNodesPlotData2(IndependenceTest independenceTest, Graph estimatedCpdag, Graph trueGraph, Double threshold, Double shuffleThreshold, Double lowRecallBound) Get accepts and rejects nodes for all nodes from Anderson-Darling test and generate the plot data for confusion statistics.Confusion statistics were calculated using Local Graph Precision and Recall (LocalGraphPrecision, LocalGraphRecall).
- Parameters:
independenceTest
- The independence test to be used for calculating p-values.estimatedCpdag
- The estimated CPDAG.trueGraph
- The true graph.threshold
- The threshold value for classifying nodes.shuffleThreshold
- The threshold value for shuffling the data. shuffleThreshold default can set to be 0.5lowRecallBound
- The bound value for recording low recall.- Returns:
- A list containing two lists: the first list contains the accepted nodes and the second list contains the
-
getPrecisionAndRecallOnMarkovBlanketGraph
public void getPrecisionAndRecallOnMarkovBlanketGraph(Node x, Graph estimatedGraph, Graph trueGraph) Calculates the precision and recall on the Markov Blanket graph for a given node. Prints the statistics to the console.- Parameters:
x
- The target node.estimatedGraph
- The estimated graph.trueGraph
- The true graph.
-
getPrecisionAndRecallOnMarkovBlanketGraphPlotData
public List<Double> getPrecisionAndRecallOnMarkovBlanketGraphPlotData(Node x, Graph estimatedGraph, Graph trueGraph) Calculates the precision and recall on the markov blanket graph plot data.- Parameters:
x
- the target nodeestimatedGraph
- the estimated graphtrueGraph
- the true graph- Returns:
- a list of doubles representing the precision and recall values: [adjacency precision, adjacency recall, arrowhead precision, arrowhead recall]
-
getPrecisionAndRecallOnMarkovBlanketGraph2
public void getPrecisionAndRecallOnMarkovBlanketGraph2(Node x, Graph estimatedGraph, Graph trueGraph) Calculates the precision and recall using LocalGraphConfusion (which calculates the combination of Adjacency and ArrowHead) on the Markov Blanket graph for a given node. Prints the statistics to the console.- Parameters:
x
- The target node.estimatedGraph
- The estimated graph.trueGraph
- The true graph.
-
getPrecisionAndRecallOnMarkovBlanketGraphPlotData2
public List<Double> getPrecisionAndRecallOnMarkovBlanketGraphPlotData2(Node x, Graph estimatedGraph, Graph trueGraph) This method calculates the precision and recall of a target node's Markov Blanket in the given estimated graph.- Parameters:
x
- the target node for which the precision and recall are calculatedestimatedGraph
- the estimated graphtrueGraph
- the true graph- Returns:
- a list of two doubles representing the precision and recall, respectively
-
getVariables
-
clear
public void clear()Clears the results stored in the `resultsIndep` and `resultsDep` lists.- See Also:
-
generateResults
public void generateResults(boolean clear) Generates all results, for both the Markov and dependency checks, for each node in the graph given the parents of that node. These results are stored in the resultsIndep and resultsDep lists. This should be called before any of the result methods. Note that only results for X _||_ Y | Z1,...,Zn are generated, where X and Y are in the independenceNodes list and Z1,...,Zn are in the conditioningNodes list.- Parameters:
clear
- True, if the results should be cleared before generating new results; otherwise, the new results are appended to the existing results.- See Also:
-
generateResults
public void generateResults(boolean indep, boolean clear) Generates results based on the specified independence and clearing conditions. The method assesses separation sets and independence facts using the current graph structure and the specified separation set type. Based on the results, it calculates statistical values and updates relevant member variables.- Parameters:
indep
- A boolean indicating whether to generate results for independence (true) or dependence (false).clear
- A boolean indicating whether to clear existing results before generating new ones (true) or to retain them (false).- See Also:
-
getSetType
Returns type of conditioning sets to use in the Markov check.- Returns:
- The type of conditioning sets to use in the Markov check.
- See Also:
-
setSetType
Sets the type of conditioning sets to use in the Markov check.- Parameters:
setType
- The type of conditioning sets to use in the Markov check.- See Also:
-
setParallelized
public void setParallelized(boolean parallelized) True if the checks should be parallelized. (Not always a good idea.)- Parameters:
parallelized
- True if the checks should be parallelized.
-
getResults
After the generateResults method has been called, this method returns the results for the Markov or dependency check, depending on the value of the indep parameter.- Parameters:
indep
- True for the Markov results, false for the dependency results.- Returns:
- The results for the Markov or dependency check.
-
getPValues
Returns the list of p-values for the given list of results.- Parameters:
results
- The results.- Returns:
- Their p-values.
-
getFractionDependent
public double getFractionDependent(boolean indep) Returns the fraction of dependent judgments for the given list of results.- Parameters:
indep
- True for the Markov results, false for the dependency results.- Returns:
- The fraction of dependent judgments for this condition.
-
getFractionDependent
Calculates the fraction of dependent results.- Parameters:
results
- the list of IndependenceResult objects- Returns:
- the fraction of dependent results as a double value
-
getKsPValue
public double getKsPValue(boolean indep) Returns the Kolmorogov-Smirnov p-value for the given list of results.- Parameters:
indep
- True for the Markov results, false for the dependency results.- Returns:
- The Kolmorogov-Smirnov p-value for this condition.
-
getAndersonDarlingA2
public double getAndersonDarlingA2(boolean indep) Returns the Anderson-Darling A^2 statistic for the given list of results.- Parameters:
indep
- True if for implied independencies, false if for implied dependencies.- Returns:
- The Anderson-Darling A^2 statistic for the given list of results.
-
getAndersonDarlingA2Star
public double getAndersonDarlingA2Star(boolean indep) Returns the Anderson-Darling A^2* statistic for the given list of results.- Parameters:
indep
- True if for implied independencies, false if for implied dependencies.- Returns:
- The Anderson-Darling A^2* statistic for the given list of results.
-
getAndersonDarlingP
public double getAndersonDarlingP(boolean indep) Returns the Anderson-Darling p-value for the given list of results.- Parameters:
indep
- True if for implied independencies, false if for implied dependencies.- Returns:
- The Anderson-Darling p-value for the given list of results.
-
getBinomialPValue
public double getBinomialPValue(boolean indep) Returns the Binomial p-value for the given list of results.- Parameters:
indep
- True if for implied independencies, false if for implied dependencies.- Returns:
- The Binomial p-value for the given list of results.
-
getNumTests
public int getNumTests(boolean indep) Returns the number of tests for the given list of results.- Parameters:
indep
- True if for implied independencies, false if for implied dependencies.- Returns:
- The number of tests for the given list of results.
-
getVariable
-
getIndependenceTest
Returns the independence test being used.- Returns:
- This test.
-
setIndependenceTest
Sets the independence test to be used for determining independence between variables.- Parameters:
test
- the independence test to be set- Throws:
IllegalArgumentException
- if the test parameter is null
-
setPercentResample
public void setPercentResample(double percentResample) Sets the percentage of all samples to use when resampling for each conditional independence test.- Parameters:
percentResample
- The percentage of all samples to use when resampling for each conditional independence test.
-
getKnowledge
Returns the knowledge object for the Markov checker. This knowledge object should contain the tier knowledge for the Markov checker. The last tier contains the possible X and Y for X _||_ Y | Z1,...,Zn, and the previous tiers contain the possible Z1,...,Zn for X _||_ Y | Z1,...,Zn. Additional forbidden or required edges are ignored.- Returns:
- The knowledge object.
-
setKnowledge
Sets the knowledge object for the Markov checker. The knowledge object should contain the tier knowledge for the Markov checker. The last tier contains the possible X and Y for X _||_ Y | Z1,...,Zn, and the previous tiers contain the possible Z1,...,Zn for X _||_ Y | Z1,...,Zn. Additional forbidden or required edges are ignored.- Parameters:
knowledge
- The knowledge object.
-
getMarkovCheckRecord
Generates the results for the given set of independence facts as a single record.- Returns:
- The Markov check record.
- Throws:
InterruptedException
- if the thread is interrupted- See Also:
-
getMarkovCheckRecordString
Returns the Markov check record as a string.- Returns:
- The Markov check record as a string.
- Throws:
InterruptedException
- if the thread is interrupted- See Also:
-
getIndependenceNodes
-
getConditioningNodes
-
isMpdag
public boolean isMpdag()Checks whether the given graph is a CPDAG (Completed Partially Directed Acyclic Graph).- Returns:
- true if the graph is a CPDAG, false otherwise
-
getKsPValue
Calculates the Kolmogorov-Smirnov (KS) p-value for a list of independence test results.- Parameters:
visiblePairs
- a list of IndependenceResult objects representing the observed values and expected values for a series of tests- Returns:
- the KS p-value calculated using the list of independence test results
-
getBinomialPValue
Calculates the binomial p-value based on the list of visible pairs.- Parameters:
visiblePairs
- a list of IndependenceResult representing the visible pairs.- Returns:
- the binomial p-value.
-
getAndersonDarlingA2
Calculates the Anderson-Darling A2 value for a list of independence results.- Parameters:
visiblePairs
- the list of independence results- Returns:
- the Anderson-Darling A2 value
-
getAndersonDarlingPValue
Calculates the Anderson-Darling p-value for a given list of independence results.- Parameters:
visiblePairs
- the list of independence results- Returns:
- the Anderson-Darling p-value
-
addObserver
Adds a ModelObserver to the list of observers.- Parameters:
observer
- the ModelObserver to be added
-
removeObserver
Removes the specified observer from the list of observers.- Parameters:
observer
- the observer to be removed
-
notifyObservers
public void notifyObservers()Notifies all registered ModelObservers by invoking their update() method. -
setFindSmallestSubset
public void setFindSmallestSubset(boolean findSmallestSubset) Sets the flag indicating whether a smallest subset of each conditioning set yielding independence should be reported.- Parameters:
findSmallestSubset
-true
if a smallest subset of each conditoning set yielding independence should be reported,false
otherwise
-
setEffectiveSampleSize
public void setEffectiveSampleSize(int sampleSize) Description copied from interface:EffectiveSampleSizeSettable
Sets the sample size if the sample size of the data or covariance matrix is not the sample size that should be used by the class.- Specified by:
setEffectiveSampleSize
in interfaceEffectiveSampleSizeSettable
- Parameters:
sampleSize
- The sample size to use.
-
setMaxLength
public void setMaxLength(int maxLength) Sets the maximum path length for relevant paths.- Parameters:
maxLength
- the maximum path length for relevant paths
-