Package edu.cmu.tetrad.util
Class RowCorrelationEffN
java.lang.Object
edu.cmu.tetrad.util.RowCorrelationEffN
Utility class for estimating the average pairwise row correlation of a data matrix and computing the effective sample
size (Neff) based on the correlations.
The core functionality involves estimating the Neff value as N / (1 + (N-1)*rhoHat), where rhoHat is the average correlation between rows. The estimation process standardizes the input data matrix, samples a defined number of random row pairs, calculates pairwise correlations, and adjusts the results to avoid negative or singular computations.
This class is designed to handle computation over larger datasets by allowing a maximum number of row pairs to sample, ensuring computational efficiency, and avoiding issues caused by excessively large row combinations.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classRepresents the result of an average pairwise row correlation estimation, containing the adjusted average row correlation value, the effective sample size, and the number of row pairs used in the computation. -
Constructor Summary
ConstructorsConstructorDescriptionConstructs a new instance of the RowCorrelationEffN class. -
Method Summary
Modifier and TypeMethodDescriptionstatic RowCorrelationEffN.Resultestimate(org.ejml.simple.SimpleMatrix X, int maxPairsToSample, int N) Estimates average pairwise row correlation (by sampling pairs) and returns Neff = N / (1 + (N-1)*rhoHat).
-
Constructor Details
-
RowCorrelationEffN
public RowCorrelationEffN()Constructs a new instance of the RowCorrelationEffN class.
-
-
Method Details
-
estimate
public static RowCorrelationEffN.Result estimate(org.ejml.simple.SimpleMatrix X, int maxPairsToSample, int N) Estimates average pairwise row correlation (by sampling pairs) and returns Neff = N / (1 + (N-1)*rhoHat). Columns are standardized first.If the sampled average correlation is < 0, we clamp it to 0 so Neff = N. If itâs ≥ 1, we clamp slightly below 1 to avoid division-by-zero.
- Parameters:
X- data matrix N x P (rows = samples, cols = features)maxPairsToSample- number of random row pairs to sample (cap at C(N,2))N- the number of rows in the data matrix- Returns:
- the result of the estimation
-