Package edu.cmu.tetrad.util
Class RankTests
java.lang.Object
edu.cmu.tetrad.util.RankTests
The RankTests class provides a suite of methods and utilities for performing rank estimation and hypothesis testing
in Canonical Correlation Analysis (CCA) and Regularized Canonical Correlation Analysis (RCCA). This includes
computation of p-values, matrix operations, singular value decomposition, and rank estimation with various methods
and regularization approaches.
The class also incorporates caching mechanisms for efficiency and includes mathematical utilities that are foundational to the CCA and RCCA computations.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classRepresents an entry in the RCCA (Regularized Canonical Correlation Analysis) data structure. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic doubleA small constant value added as a ridge term during regularization to improve numerical stability. -
Method Summary
Modifier and TypeMethodDescriptionstatic int[]diff(int[] A, int[] B) Computes the difference between two arrays, returning an array of elements that are present in the first array but not in the second.static intestimateWilksRank(org.ejml.simple.SimpleMatrix Scond, int[] xIdxLocal, int[] yIdxLocal, int n, double alpha) Estimates the regularized canonical correlation analysis (rCCA) rank by sequentially testing the rank using Wilks' Lambda statistic.static intestimateWilksRankConditioned(org.ejml.simple.SimpleMatrix S, int[] C, int[] VminusC, int[] Z, int n, double alpha) Estimates the Wilks rank for variables X and Y conditioned on variables Z using the given covariance matrix and parameters.static intestimateWilksRankFast(org.ejml.simple.SimpleMatrix S, int[] xIdx, int[] yIdx, int n, double alpha) Estimates the rank of a matrix using the Wilks test and a Bartlett ϲ approximation.static RankTests.RccaEntrygetRccaEntry(org.ejml.simple.SimpleMatrix S, int[] xIdx, int[] yIdx, double regLambda) Retrieves or computes an RCCA (Regularized Canonical Correlation Analysis) entry for the given parameters.static RankTests.RccaEntrygetRccaEntryConditioned(org.ejml.simple.SimpleMatrix S, int[] C, int[] D, int[] Z, double ridge) RCCA entry for (C, D) after partialing out Z: S_|Z = S - S_{.,Z} * inv(S_{Z,Z} + ridge*I) * S_{Z,.} Then run RCCA on (C, D) blocks of S_|Z with the same ridge regularization on R_cc and R_dd that getRccaEntry(...) uses.static doublepValueIndepConditioned(org.ejml.simple.SimpleMatrix S, int[] X, int[] Y, int[] Z, int n) p-value for H0: rank(X â Y | Z) ⤠0 using Wilks/Bartlett on partial CCA.static doublerankLeByWilks(org.ejml.simple.SimpleMatrix Scond, int[] xLoc, int[] yLoc, int n, int r) Determines whether the rank is less than or equal to a specified value r using a Wilks' lambda test.static int[]Converts a List of Integer objects into an array of primitive int values.int[]union(int[] A, int b) Computes the union of the elements from the given array and a single integer value.static int[]union(int[] A, int[] B) Computes the union of two integer arrays and returns the result as an array.static int[]Computes the union of a list of integers and a single integer.
-
Field Details
-
RIDGE
public static double RIDGEA small constant value added as a ridge term during regularization to improve numerical stability. This helps prevent issues such as singular matrices or poor conditioning in mathematical computations.
-
-
Method Details
-
estimateWilksRank
public static int estimateWilksRank(org.ejml.simple.SimpleMatrix Scond, int[] xIdxLocal, int[] yIdxLocal, int n, double alpha) Estimates the regularized canonical correlation analysis (rCCA) rank by sequentially testing the rank using Wilks' Lambda statistic.- Parameters:
Scond- A matrix representing the conditioned covariance or correlation structure of the input data.xIdxLocal- An array of indices corresponding to the local x-variables involved in the calculation.yIdxLocal- An array of indices corresponding to the local y-variables involved in the calculation.n- The total number of observations in the dataset.alpha- The significance level for the rank testing, typically between 0 and 1.- Returns:
- The estimated rank for the rCCA, which is the number of canonical correlations deemed statistically significant, constrained by the dimensions of the input data.
-
estimateWilksRankFast
public static int estimateWilksRankFast(org.ejml.simple.SimpleMatrix S, int[] xIdx, int[] yIdx, int n, double alpha) Estimates the rank of a matrix using the Wilks test and a Bartlett ϲ approximation. This method employs an optimization for fast computation.- Parameters:
S- Covariance or scatter matrix (SimpleMatrix) of size (p + q) x (p + q).xIdx- Indices for the x variables, representing the first group of variables.yIdx- Indices for the y variables, representing the second group of variables.n- Sample size used for the computation and statistical testing.alpha- Significance level for hypothesis testing (e.g., 0.05 for 5%).- Returns:
- Estimated rank of the matrix, computed based on the Wilks test criteria.
-
rankLeByWilks
public static double rankLeByWilks(org.ejml.simple.SimpleMatrix Scond, int[] xLoc, int[] yLoc, int n, int r) Determines whether the rank is less than or equal to a specified value r using a Wilks' lambda test. This method performs hypothesis testing on the rank condition of a block matrix.- Parameters:
Scond- The conditioned covariance matrix or a similar input matrix.xLoc- An array of integers representing the indices of the x-block variables.yLoc- An array of integers representing the indices of the y-block variables.n- The number of observations or sample size.r- The rank condition to test (non-negative integer).- Returns:
- the p-value if the hypothesis that the rank is less than or equal to r is accepted.
-
estimateWilksRankConditioned
public static int estimateWilksRankConditioned(org.ejml.simple.SimpleMatrix S, int[] C, int[] VminusC, int[] Z, int n, double alpha) Estimates the Wilks rank for variables X and Y conditioned on variables Z using the given covariance matrix and parameters.- Parameters:
S- the covariance matrix representing the relationships between all variablesC- an array of indices representing the variables in set CVminusC- an array of indices representing the variables outside of set CZ- an array of indices representing the variables in set Z on which to conditionn- the sample size used to calculate the covariance matrix Salpha- the significance level for testing- Returns:
- the estimated Wilks rank for the variables in X and Y conditioned on Z
-
diff
public static int[] diff(int[] A, int[] B) Computes the difference between two arrays, returning an array of elements that are present in the first array but not in the second.- Parameters:
A- the first array of integersB- the second array of integers- Returns:
- an array of integers containing elements from the first array that are not present in the second array
-
union
public static int[] union(int[] A, int[] B) Computes the union of two integer arrays and returns the result as an array.- Parameters:
A- the first array of integersB- the second array of integers- Returns:
- an array containing the union of the elements from both input arrays
-
union
Computes the union of a list of integers and a single integer. The union operation adds the integer to the set of elements in the list, ensuring no duplicates.- Parameters:
A- the list of integers to be included in the unionb- the integer to be added to the union- Returns:
- an array representing the union of the input list and the single integer
-
toArray
Converts a List of Integer objects into an array of primitive int values.- Parameters:
Z- the List of Integer objects to be converted into an int array- Returns:
- an array of int containing the values from the input List in the same order
-
pValueIndepConditioned
public static double pValueIndepConditioned(org.ejml.simple.SimpleMatrix S, int[] X, int[] Y, int[] Z, int n) p-value for H0: rank(X â Y | Z) ⤠0 using Wilks/Bartlett on partial CCA.- Parameters:
S- The covariance matrix of all variables.X- An array of indices representing the first subset of variables.Y- An array of indices representing the second subset of variables.Z- An array of indices representing the conditioning set of variables.n- The number of samples used in calculating the covariance matrix.- Returns:
- The p-value representing the probability of observing the computed test statistic under the null hypothesis of conditional independence. Returns 1.0 if the size of X or Y is zero after exclusion of Z, or if degrees of freedom (df) are less than or equal to zero.
-
getRccaEntry
public static RankTests.RccaEntry getRccaEntry(org.ejml.simple.SimpleMatrix S, int[] xIdx, int[] yIdx, double regLambda) Retrieves or computes an RCCA (Regularized Canonical Correlation Analysis) entry for the given parameters. If the entry is cached, it retrieves the result from the cache. Otherwise, it computes the result based on the provided inputs.- Parameters:
S- a SimpleMatrix representing the data matrixxIdx- an array of indices corresponding to the X variablesyIdx- an array of indices corresponding to the Y variablesregLambda- a regularization parameter value- Returns:
- an RccaEntry containing canonical correlation results including singular values and suffix logs for the given inputs, or null if the computation fails
-
getRccaEntryConditioned
public static RankTests.RccaEntry getRccaEntryConditioned(org.ejml.simple.SimpleMatrix S, int[] C, int[] D, int[] Z, double ridge) RCCA entry for (C, D) after partialing out Z: S_|Z = S - S_{.,Z} * inv(S_{Z,Z} + ridge*I) * S_{Z,.} Then run RCCA on (C, D) blocks of S_|Z with the same ridge regularization on R_cc and R_dd that getRccaEntry(...) uses.- Parameters:
S- correlation/covariance over observed variablesC- left index setD- right index setZ- conditioning index setridge- small diagonal added to R_cc and R_dd (and to S_ZZ before inverting)- Returns:
- RccaEntry whose suffixLogs has suf[0] == 0 and suf[r] = sum_{i=1..r} log(1 - rho_i^2) in the order of descending canonical correlations
-
union
public int[] union(int[] A, int b) Computes the union of the elements from the given array and a single integer value. The union is returned as an array of unique integers.- Parameters:
A- an array of integers whose elements will contribute to the union setb- a single integer that will also be included in the union set- Returns:
- an array of integers containing the union of the input array and the single integer, with all duplicate elements removed
-