Class RankTests

java.lang.Object
edu.cmu.tetrad.util.RankTests

public class RankTests extends Object
The RankTests class provides a suite of methods and utilities for performing rank estimation and hypothesis testing in Canonical Correlation Analysis (CCA) and Regularized Canonical Correlation Analysis (RCCA). This includes computation of p-values, matrix operations, singular value decomposition, and rank estimation with various methods and regularization approaches.

The class also incorporates caching mechanisms for efficiency and includes mathematical utilities that are foundational to the CCA and RCCA computations.

  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static final class 
    Represents an entry in the RCCA (Regularized Canonical Correlation Analysis) data structure.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static double
    A small constant value added as a ridge term during regularization to improve numerical stability.
  • Method Summary

    Modifier and Type
    Method
    Description
    static int[]
    diff(int[] A, int[] B)
    Computes the difference between two arrays, returning an array of elements that are present in the first array but not in the second.
    static int
    estimateWilksRank(org.ejml.simple.SimpleMatrix Scond, int[] xIdxLocal, int[] yIdxLocal, int n, double alpha)
    Estimates the regularized canonical correlation analysis (rCCA) rank by sequentially testing the rank using Wilks' Lambda statistic.
    static int
    estimateWilksRankConditioned(org.ejml.simple.SimpleMatrix S, int[] C, int[] VminusC, int[] Z, int n, double alpha)
    Estimates the Wilks rank for variables X and Y conditioned on variables Z using the given covariance matrix and parameters.
    static int
    estimateWilksRankFast(org.ejml.simple.SimpleMatrix S, int[] xIdx, int[] yIdx, int n, double alpha)
    Estimates the rank of a matrix using the Wilks test and a Bartlett χ² approximation.
    getRccaEntry(org.ejml.simple.SimpleMatrix S, int[] xIdx, int[] yIdx, double regLambda)
    Retrieves or computes an RCCA (Regularized Canonical Correlation Analysis) entry for the given parameters.
    getRccaEntryConditioned(org.ejml.simple.SimpleMatrix S, int[] C, int[] D, int[] Z, double ridge)
    RCCA entry for (C, D) after partialing out Z: S_|Z = S - S_{.,Z} * inv(S_{Z,Z} + ridge*I) * S_{Z,.} Then run RCCA on (C, D) blocks of S_|Z with the same ridge regularization on R_cc and R_dd that getRccaEntry(...) uses.
    static double
    pValueIndepConditioned(org.ejml.simple.SimpleMatrix S, int[] X, int[] Y, int[] Z, int n)
    p-value for H0: rank(X ⟂ Y | Z) ≤ 0 using Wilks/Bartlett on partial CCA.
    static double
    rankLeByWilks(org.ejml.simple.SimpleMatrix Scond, int[] xLoc, int[] yLoc, int n, int r)
    Determines whether the rank is less than or equal to a specified value r using a Wilks' lambda test.
    static int[]
    Converts a List of Integer objects into an array of primitive int values.
    int[]
    union(int[] A, int b)
    Computes the union of the elements from the given array and a single integer value.
    static int[]
    union(int[] A, int[] B)
    Computes the union of two integer arrays and returns the result as an array.
    static int[]
    union(List<Integer> A, int b)
    Computes the union of a list of integers and a single integer.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • RIDGE

      public static double RIDGE
      A small constant value added as a ridge term during regularization to improve numerical stability. This helps prevent issues such as singular matrices or poor conditioning in mathematical computations.
  • Method Details

    • estimateWilksRank

      public static int estimateWilksRank(org.ejml.simple.SimpleMatrix Scond, int[] xIdxLocal, int[] yIdxLocal, int n, double alpha)
      Estimates the regularized canonical correlation analysis (rCCA) rank by sequentially testing the rank using Wilks' Lambda statistic.
      Parameters:
      Scond - A matrix representing the conditioned covariance or correlation structure of the input data.
      xIdxLocal - An array of indices corresponding to the local x-variables involved in the calculation.
      yIdxLocal - An array of indices corresponding to the local y-variables involved in the calculation.
      n - The total number of observations in the dataset.
      alpha - The significance level for the rank testing, typically between 0 and 1.
      Returns:
      The estimated rank for the rCCA, which is the number of canonical correlations deemed statistically significant, constrained by the dimensions of the input data.
    • estimateWilksRankFast

      public static int estimateWilksRankFast(org.ejml.simple.SimpleMatrix S, int[] xIdx, int[] yIdx, int n, double alpha)
      Estimates the rank of a matrix using the Wilks test and a Bartlett χ² approximation. This method employs an optimization for fast computation.
      Parameters:
      S - Covariance or scatter matrix (SimpleMatrix) of size (p + q) x (p + q).
      xIdx - Indices for the x variables, representing the first group of variables.
      yIdx - Indices for the y variables, representing the second group of variables.
      n - Sample size used for the computation and statistical testing.
      alpha - Significance level for hypothesis testing (e.g., 0.05 for 5%).
      Returns:
      Estimated rank of the matrix, computed based on the Wilks test criteria.
    • rankLeByWilks

      public static double rankLeByWilks(org.ejml.simple.SimpleMatrix Scond, int[] xLoc, int[] yLoc, int n, int r)
      Determines whether the rank is less than or equal to a specified value r using a Wilks' lambda test. This method performs hypothesis testing on the rank condition of a block matrix.
      Parameters:
      Scond - The conditioned covariance matrix or a similar input matrix.
      xLoc - An array of integers representing the indices of the x-block variables.
      yLoc - An array of integers representing the indices of the y-block variables.
      n - The number of observations or sample size.
      r - The rank condition to test (non-negative integer).
      Returns:
      the p-value if the hypothesis that the rank is less than or equal to r is accepted.
    • estimateWilksRankConditioned

      public static int estimateWilksRankConditioned(org.ejml.simple.SimpleMatrix S, int[] C, int[] VminusC, int[] Z, int n, double alpha)
      Estimates the Wilks rank for variables X and Y conditioned on variables Z using the given covariance matrix and parameters.
      Parameters:
      S - the covariance matrix representing the relationships between all variables
      C - an array of indices representing the variables in set C
      VminusC - an array of indices representing the variables outside of set C
      Z - an array of indices representing the variables in set Z on which to condition
      n - the sample size used to calculate the covariance matrix S
      alpha - the significance level for testing
      Returns:
      the estimated Wilks rank for the variables in X and Y conditioned on Z
    • diff

      public static int[] diff(int[] A, int[] B)
      Computes the difference between two arrays, returning an array of elements that are present in the first array but not in the second.
      Parameters:
      A - the first array of integers
      B - the second array of integers
      Returns:
      an array of integers containing elements from the first array that are not present in the second array
    • union

      public static int[] union(int[] A, int[] B)
      Computes the union of two integer arrays and returns the result as an array.
      Parameters:
      A - the first array of integers
      B - the second array of integers
      Returns:
      an array containing the union of the elements from both input arrays
    • union

      public static int[] union(List<Integer> A, int b)
      Computes the union of a list of integers and a single integer. The union operation adds the integer to the set of elements in the list, ensuring no duplicates.
      Parameters:
      A - the list of integers to be included in the union
      b - the integer to be added to the union
      Returns:
      an array representing the union of the input list and the single integer
    • toArray

      public static int[] toArray(List<Integer> Z)
      Converts a List of Integer objects into an array of primitive int values.
      Parameters:
      Z - the List of Integer objects to be converted into an int array
      Returns:
      an array of int containing the values from the input List in the same order
    • pValueIndepConditioned

      public static double pValueIndepConditioned(org.ejml.simple.SimpleMatrix S, int[] X, int[] Y, int[] Z, int n)
      p-value for H0: rank(X ⟂ Y | Z) ≤ 0 using Wilks/Bartlett on partial CCA.
      Parameters:
      S - The covariance matrix of all variables.
      X - An array of indices representing the first subset of variables.
      Y - An array of indices representing the second subset of variables.
      Z - An array of indices representing the conditioning set of variables.
      n - The number of samples used in calculating the covariance matrix.
      Returns:
      The p-value representing the probability of observing the computed test statistic under the null hypothesis of conditional independence. Returns 1.0 if the size of X or Y is zero after exclusion of Z, or if degrees of freedom (df) are less than or equal to zero.
    • getRccaEntry

      public static RankTests.RccaEntry getRccaEntry(org.ejml.simple.SimpleMatrix S, int[] xIdx, int[] yIdx, double regLambda)
      Retrieves or computes an RCCA (Regularized Canonical Correlation Analysis) entry for the given parameters. If the entry is cached, it retrieves the result from the cache. Otherwise, it computes the result based on the provided inputs.
      Parameters:
      S - a SimpleMatrix representing the data matrix
      xIdx - an array of indices corresponding to the X variables
      yIdx - an array of indices corresponding to the Y variables
      regLambda - a regularization parameter value
      Returns:
      an RccaEntry containing canonical correlation results including singular values and suffix logs for the given inputs, or null if the computation fails
    • getRccaEntryConditioned

      public static RankTests.RccaEntry getRccaEntryConditioned(org.ejml.simple.SimpleMatrix S, int[] C, int[] D, int[] Z, double ridge)
      RCCA entry for (C, D) after partialing out Z: S_|Z = S - S_{.,Z} * inv(S_{Z,Z} + ridge*I) * S_{Z,.} Then run RCCA on (C, D) blocks of S_|Z with the same ridge regularization on R_cc and R_dd that getRccaEntry(...) uses.
      Parameters:
      S - correlation/covariance over observed variables
      C - left index set
      D - right index set
      Z - conditioning index set
      ridge - small diagonal added to R_cc and R_dd (and to S_ZZ before inverting)
      Returns:
      RccaEntry whose suffixLogs has suf[0] == 0 and suf[r] = sum_{i=1..r} log(1 - rho_i^2) in the order of descending canonical correlations
    • union

      public int[] union(int[] A, int b)
      Computes the union of the elements from the given array and a single integer value. The union is returned as an array of unique integers.
      Parameters:
      A - an array of integers whose elements will contribute to the union set
      b - a single integer that will also be included in the union set
      Returns:
      an array of integers containing the union of the input array and the single integer, with all duplicate elements removed