Class Kci

java.lang.Object
edu.cmu.tetrad.search.test.Kci
All Implemented Interfaces:
RawMarginalIndependenceTest, IndependenceTest

public class Kci extends Object implements IndependenceTest, RawMarginalIndependenceTest
The Kci class implements the Kernel-based Conditional Independence (KCI) test for statistical independence between variables. It supports various kernel types (e.g., Gaussian, Polynomial, Linear) and provides both Gamma approximation as well as permutation-based p-value computation. This class utilizes kernel matrices and bandwidth selection heuristics for efficient statistical test computation.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static enum 
    Enum representing the type of kernel function used in kernel-based computations.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    RNG for permutations; can be null (seeded later).
  • Constructor Summary

    Constructors
    Constructor
    Description
    Kci(DataSet dataSet)
    Constructs a Kci instance with the given DataSet.
    Kci(org.ejml.simple.SimpleMatrix dataVxN, Map<Node,Integer> varToRow, org.ejml.simple.SimpleMatrix hHint, List<Integer> rows)
    Constructs a Kci instance using specified data, variable-to-row mapping, an optional hint matrix, and a list of row indices.
  • Method Summary

    Modifier and Type
    Method
    Description
    Tests the conditional independence of two given variables (x and y) with respect to a set of conditioning variables (z) using the KCI (Kernel-based Conditional Independence) method.
    double
    computePValue(double[] x, double[] y)
    Computes the p-value for testing the independence of two variables represented by the input arrays.
    double
    computePValueFromCenteredKernels(org.ejml.simple.SimpleMatrix centeredKx, org.ejml.simple.SimpleMatrix centeredKy)
    Computes the p-value from two centered kernel matrices using statistical methods.
    double
    Retrieves the value of the alpha threshold, which is generally used for statistical tests to determine the significance or rejection criteria.
    Retrieves the data model associated with the current instance.
    double
    Retrieves the epsilon value.
    Retrieves the kernel type.
    int
    Retrieves the number of permutations to be used in permutation tests.
    double
    Retrieves the coefficient of the polynomial for the term of degree 0.
    int
    Retrieves the degree of the polynomial.
    double
    Retrieves the gamma parameter for the polynomial kernel.
    double
    Retrieves the scaling factor for the Gaussian bandwidth heuristic.
    Retrieves the list of variables associated with the current instance.
    boolean
    Retrieves whether the method should use an approximate approach or a permutation test.
    double
    isIndependenceConditional(Node x, Node y, List<Node> z, double alpha)
    Tests for conditional independence between two variables given a set of conditioning variables.
    boolean
    Indicates whether verbose mode is enabled.
    void
    setAlpha(double alpha)
    Sets the value of the alpha threshold, which is typically used for statistical testing to determine the significance level or rejection criteria.
    void
    setApproximate(boolean approximate)
    Sets whether the method should use an approximate approach or a permutation test.
    void
    setEpsilon(double epsilon)
    Sets the epsilon value.
    void
    Sets the kernel type.
    void
    setNumPermutations(int numPermutations)
    Sets the number of permutations to be used in permutation tests.
    void
    setPolyCoef0(double polyCoef0)
    Sets the value of the polynomial coefficient at index 0.
    void
    setPolyDegree(int polyDegree)
    Sets the degree of the polynomial.
    void
    setPolyGamma(double polyGamma)
    Sets the polyGamma value.
    void
    setScalingFactor(double scalingFactor)
    Sets the scaling factor for the Gaussian bandwidth heuristic.
    void
    setVerbose(boolean verbose)
    Sets the verbose mode for the current instance.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface edu.cmu.tetrad.search.RawMarginalIndependenceTest

    computePValue
  • Field Details

    • rng

      public Random rng
      RNG for permutations; can be null (seeded later).
  • Constructor Details

    • Kci

      public Kci(DataSet dataSet)
      Constructs a Kci instance with the given DataSet.
      Parameters:
      dataSet - the dataset containing the data to be analyzed. It is used to initialize the data matrix, variable list, and other attributes.
    • Kci

      public Kci(org.ejml.simple.SimpleMatrix dataVxN, Map<Node,Integer> varToRow, org.ejml.simple.SimpleMatrix hHint, List<Integer> rows)
      Constructs a Kci instance using specified data, variable-to-row mapping, an optional hint matrix, and a list of row indices. This constructor initializes the internal fields required for kernel-based independence testing.
      Parameters:
      dataVxN - a SimpleMatrix representing the data matrix where rows correspond to variables and columns correspond to observations.
      varToRow - a map from Node instances to integer indices, specifying the row mapping for variables.
      hHint - a SimpleMatrix used as a hint for the kernel computation, often representing precomputed or auxiliary data; can be null if not applicable.
      rows - a list of integers representing the indices of rows to be used in the computation.
  • Method Details

    • getAlpha

      public double getAlpha()
      Retrieves the value of the alpha threshold, which is generally used for statistical tests to determine the significance or rejection criteria.
      Specified by:
      getAlpha in interface IndependenceTest
      Returns:
      the value of alpha as a double.
    • setAlpha

      public void setAlpha(double alpha)
      Sets the value of the alpha threshold, which is typically used for statistical testing to determine the significance level or rejection criteria.
      Specified by:
      setAlpha in interface IndependenceTest
      Parameters:
      alpha - the value of alpha to set, represented as a double.
    • checkIndependence

      public IndependenceResult checkIndependence(Node x, Node y, Set<Node> z) throws InterruptedException
      Tests the conditional independence of two given variables (x and y) with respect to a set of conditioning variables (z) using the KCI (Kernel-based Conditional Independence) method. This method evaluates whether x and y are independent given z by calculating a p-value and comparing it against the alpha threshold.
      Specified by:
      checkIndependence in interface IndependenceTest
      Parameters:
      x - the first variable to be tested for independence, represented as a Node.
      y - the second variable to be tested for independence, represented as a Node.
      z - the set of conditioning variables, represented as a Set of Node objects.
      Returns:
      an IndependenceResult object containing the results of the independence test, including the independence fact, the p-value, and additional statistical details.
      Throws:
      InterruptedException - if the thread executing the method is interrupted during execution.
    • getVariables

      public List<Node> getVariables()
      Retrieves the list of variables associated with the current instance. This method returns a new list containing the variables, ensuring that modifications to the returned list do not affect the original list.
      Specified by:
      getVariables in interface IndependenceTest
      Returns:
      a List of Node objects representing the variables.
    • getData

      public DataModel getData()
      Retrieves the data model associated with the current instance.
      Specified by:
      getData in interface IndependenceTest
      Returns:
      the DataModel object representing the dataset being analyzed.
    • isVerbose

      public boolean isVerbose()
      Indicates whether verbose mode is enabled.
      Specified by:
      isVerbose in interface IndependenceTest
      Returns:
      true if verbose mode is enabled, false otherwise
    • setVerbose

      public void setVerbose(boolean verbose)
      Sets the verbose mode for the current instance.
      Specified by:
      setVerbose in interface IndependenceTest
      Parameters:
      verbose - True, if so.
    • isIndependenceConditional

      public double isIndependenceConditional(Node x, Node y, List<Node> z, double alpha)
      Tests for conditional independence between two variables given a set of conditioning variables. This method computes a test statistic and its corresponding p-value using either an approximate method or a permutation-based method depending on the configuration.
      Parameters:
      x - The first variable to test for independence.
      y - The second variable to test for independence.
      z - The list of conditioning variables.
      alpha - The significance level used for the independence test.
      Returns:
      The p-value of the conditional independence test. A small p-value (less than alpha) indicates that x and y are not conditionally independent given z.
      Throws:
      NullPointerException - If x or y is null.
    • computePValue

      public double computePValue(double[] x, double[] y)
      Computes the p-value for testing the independence of two variables represented by the input arrays. The method utilizes a kernel-based conditional independence test (KCI) provided by BFIT.
      Specified by:
      computePValue in interface RawMarginalIndependenceTest
      Parameters:
      x - the first array of observed values representing one variable. It must not be null and should contain at least three elements.
      y - the second array of observed values representing another variable. It must not be null, should contain at least three elements, and have the same length as the first array.
      Returns:
      the computed p-value as a double. A result closer to 0 suggests stronger evidence against the null hypothesis of independence, while a value close to 1 supports independence. If the input arrays are invalid or if an error occurs, the method returns 1.0.
    • computePValueFromCenteredKernels

      public double computePValueFromCenteredKernels(org.ejml.simple.SimpleMatrix centeredKx, org.ejml.simple.SimpleMatrix centeredKy)
      Computes the p-value from two centered kernel matrices using statistical methods. Depending on whether an approximate or exact method is specified, it calculates the p-value using a gamma distribution or a permutation test.
      Parameters:
      centeredKx - A centered kernel matrix (n x n) representing one dataset.
      centeredKy - A centered kernel matrix (n x n) representing another dataset.
      Returns:
      The computed p-value indicating the statistical relationship between the two datasets.
      Throws:
      IllegalArgumentException - If the provided matrices are not square and of the same dimensions (n x n).
    • getPolyDegree

      public int getPolyDegree()
      Retrieves the degree of the polynomial.
      Returns:
      the degree of the polynomial as an integer
    • setPolyDegree

      public void setPolyDegree(int polyDegree)
      Sets the degree of the polynomial.
      Parameters:
      polyDegree - the degree of the polynomial to be set
    • getPolyCoef0

      public double getPolyCoef0()
      Retrieves the coefficient of the polynomial for the term of degree 0.
      Returns:
      the value of the polynomial coefficient for the term of degree 0
    • setPolyCoef0

      public void setPolyCoef0(double polyCoef0)
      Sets the value of the polynomial coefficient at index 0.
      Parameters:
      polyCoef0 - the value to set for the polynomial coefficient at index 0
    • getPolyGamma

      public double getPolyGamma()
      Retrieves the gamma parameter for the polynomial kernel.
      Returns:
      the gamma parameter for the polynomial kernel
    • setPolyGamma

      public void setPolyGamma(double polyGamma)
      Sets the polyGamma value.
      Parameters:
      polyGamma - the value to set for the polyGamma property
    • getKernelType

      public Kci.KernelType getKernelType()
      Retrieves the kernel type.
      Returns:
      the kernel type
    • setKernelType

      public void setKernelType(Kci.KernelType kernelType)
      Sets the kernel type.
      Parameters:
      kernelType - the kernel type to set
    • getEpsilon

      public double getEpsilon()
      Retrieves the epsilon value.
      Returns:
      the epsilon value
    • setEpsilon

      public void setEpsilon(double epsilon)
      Sets the epsilon value.
      Parameters:
      epsilon - the epsilon value to set
    • getScalingFactor

      public double getScalingFactor()
      Retrieves the scaling factor for the Gaussian bandwidth heuristic.
      Returns:
      the scaling factor
    • setScalingFactor

      public void setScalingFactor(double scalingFactor)
      Sets the scaling factor for the Gaussian bandwidth heuristic. The scaling factor is used to modify the bandwidth by scaling it multiplicatively (sigma *= scalingFactor).
      Parameters:
      scalingFactor - the scaling factor to set; a multiplier for the Gaussian bandwidth heuristic.
    • isApproximate

      public boolean isApproximate()
      Retrieves whether the method should use an approximate approach or a permutation test.
      Returns:
      true if approximate method is used, false if permutation test is used
    • setApproximate

      public void setApproximate(boolean approximate)
      Sets whether the method should use an approximate approach or a permutation test.
      Parameters:
      approximate - true to use approximate method, false to use permutation test
    • getNumPermutations

      public int getNumPermutations()
      Retrieves the number of permutations to be used in permutation tests.
      Returns:
      the number of permutations to be used in permutation tests
    • setNumPermutations

      public void setNumPermutations(int numPermutations)
      Sets the number of permutations to be used in permutation tests.
      Parameters:
      numPermutations - the number of permutations to set, typically used when conducting statistical tests that involve random shuffling of data to approximate a distribution.