Class ConditionalCorrelationIndependence

java.lang.Object
edu.cmu.tetrad.search.test.ConditionalCorrelationIndependence
All Implemented Interfaces:
RowsSettable

public final class ConditionalCorrelationIndependence extends Object implements RowsSettable
Checks conditional independence of variable in a continuous data set using Daudin's method. See

Ramsey, J. D. (2014). A scalable conditional independence test for nonlinear, non-Gaussian data. arXiv preprint arXiv:1401.5031.

This is corrected using Lemma 2, condition 4 of

Zhang, K., Peters, J., Janzing, D., and Schölkopf, B. (2012). Kernel-based conditional independence test and application in causal discovery. arXiv preprint arXiv:1202.3775.

This all follows the original Daudin paper, which is this:

Daudin, J. J. (1980). Partial association measures and an application to qualitative regression. Biometrika, 67(3), 581-590.

Updated 2024-11-24 josephramsey

Author:
josephramsey
  • Constructor Summary

    Constructors
    Constructor
    Description
    ConditionalCorrelationIndependence(DataSet dataSet, int basisType, double basisScale, int numFunctions)
    Initializes a new instance of the ConditionalCorrelationIndependence class using the provided DataSet.
  • Method Summary

    Modifier and Type
    Method
    Description
    int
    Retrieves the number of functions used in the ConditionalCorrelationIndependence analysis.
    double
    getPValue(double score)
    Calculates the p-value for a given score using the cumulative distribution function (CDF) of a standard normal distribution.
    Retrieves the list of row indices currently set for the analysis.
    double
    Retrieves the kernel scaling factor.
    double
    Determines whether two given nodes are independent given a set of conditioning nodes, and calculates a score.
    double
    permutationTest(Node x, Node y, Set<Node> z, int numPermutations)
    Performs a permutation test to empirically determine the distribution of p-values under the null hypothesis.
    void
    setNumFunctions(int numFunctions)
    Sets the number of functions used in the ConditionalCorrelationIndependence analysis.
    void
    Sets the list of row indices
    void
    setScalingFactor(double scalingFactor)
    Sets the bandwidth adjustment value for the ConditionalCorrelationIndependence analysis.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • ConditionalCorrelationIndependence

      public ConditionalCorrelationIndependence(DataSet dataSet, int basisType, double basisScale, int numFunctions)
      Initializes a new instance of the ConditionalCorrelationIndependence class using the provided DataSet.
      Parameters:
      dataSet - The dataset to be used for the analysis. This dataset must not be null and will be standardized.
      basisType - The type of basis function to be used in the analysis. This value must be a positive integer.
      basisScale - The scaling factor used to adjust the bandwidth for the analysis, or 0.0 if the data should be standardized.
      numFunctions - The number of functions to be used in the analysis. This value must be a positive integer.
      Throws:
      NullPointerException - if the provided dataset is null.
  • Method Details

    • isIndependent

      public double isIndependent(Node x, Node y, Set<Node> _z)
      Determines whether two given nodes are independent given a set of conditioning nodes, and calculates a score.
      Parameters:
      x - The first node.
      y - The second node.
      _z - The set of conditioning nodes.
      Returns:
      The score representing the level of independence between nodes x and y given the conditioning set _z. Returns Double.NaN if the score cannot be computed or is not a number.
    • getPValue

      public double getPValue(double score)
      Calculates the p-value for a given score using the cumulative distribution function (CDF) of a standard normal distribution.
      Parameters:
      score - The score for which the p-value needs to be calculated. This score is typically a test statistic resulting from some statistical test.
      Returns:
      The p-value corresponding to the given score, indicating the probability of obtaining a value at least as extreme as the observed score under the null hypothesis.
    • getRows

      public List<Integer> getRows()
      Retrieves the list of row indices currently set for the analysis. If no rows are set, return a list of all row indices.
      Specified by:
      getRows in interface RowsSettable
      Returns:
      A list of row indices.
    • setRows

      public void setRows(List<Integer> rows)
      Sets the list of row indices
      Specified by:
      setRows in interface RowsSettable
      Parameters:
      rows - The list of row indices to set.
    • getNumFunctions

      public int getNumFunctions()
      Retrieves the number of functions used in the ConditionalCorrelationIndependence analysis.
      Returns:
      The number of functions used in the analysis.
    • setNumFunctions

      public void setNumFunctions(int numFunctions)
      Sets the number of functions used in the ConditionalCorrelationIndependence analysis.
      Parameters:
      numFunctions - the number of functions to set. This value must be a positive integer.
    • getScalingFactor

      public double getScalingFactor()
      Retrieves the kernel scaling factor.
      Returns:
      The scaling factor used in the analysis.
    • setScalingFactor

      public void setScalingFactor(double scalingFactor)
      Sets the bandwidth adjustment value for the ConditionalCorrelationIndependence analysis.

      Default is 2.

      Parameters:
      scalingFactor - The new bandwidth adjustment factor to be used. This value adjusts the bandwidth calculation for conditional independence tests and impacts the sensitivity of the kernel-based analysis.
    • permutationTest

      public double permutationTest(Node x, Node y, Set<Node> z, int numPermutations)
      Performs a permutation test to empirically determine the distribution of p-values under the null hypothesis.
      Parameters:
      x - The first node.
      y - The second node.
      z - The set of conditioning nodes.
      numPermutations - The number of permutations to perform.
      Returns:
      The mean p-value for the given number of permutations.