Package edu.cmu.tetrad.search
Class ScoredClusterFinder
java.lang.Object
edu.cmu.tetrad.search.ScoredClusterFinder
ScoredClusterFinder ------------------- Given a DataSet and a subset of candidate variables Vsub (by column index),
enumerate all clusters C â Vsub of a fixed size s and keep those for which a BIC-style RCCA score is maximized
exactly at rank k when scored against D = Vsub \ C.
Scoring model (same spirit as BlocksBicScore): Fit(r) = -nEff * sum_{i=1..r} log(1 - rho_i^2) Pen(r) = c * [ r * (p + q - r) ] * log(n) + 2*gamma * [ r * (p + q - r) ] * log(P_pool) where p = |C|, q = |D|, m = min(p,q,n-1), r â {0..m}, and nEff = max(1, n - 1 - (p + q + 1)/2). We pick r* that maximizes Fit(r) - Pen(r).
A cluster is accepted if r* == targetRank and, optionally, has margins over r*±1.
Thread-safe; uses parallel enumeration and lock-free collections.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classResult holder for one accepted cluster. -
Constructor Summary
ConstructorsConstructorDescriptionScoredClusterFinder(DataSet dataSet, Collection<Integer> candidateVarIndices) Constructs a ScoredClusterFinder instance using the provided dataset and a collection of candidate variable indices. -
Method Summary
Modifier and TypeMethodDescriptionfindClusters(int size, int targetRank) Find all clusters of size 'size' inside Vsub whose RCCA-BIC score is maximized at rank 'targetRank' when contrasted with D = Vsub \ C.voidsetEbicGamma(double gamma) Sets the EBIC gamma parameter to the specified value.voidsetMargins(double marginKm1, double marginKp1) Sets the margin values for the preceding and succeeding clusters.voidsetPenaltyDiscount(double c) Sets the penalty discount value to the given value.voidsetRidge(double ridge) Sets the ridge parameter to the given value.voidsetVerbose(boolean verbose) Enables or disables verbose mode for this instance.
-
Constructor Details
-
ScoredClusterFinder
Constructs a ScoredClusterFinder instance using the provided dataset and a collection of candidate variable indices. This process initializes a correlation matrix, ensures the validity of variable indices, and organizes the indices in a deterministic order.- Parameters:
dataSet- the dataset from which the correlation matrix is derived; must not be nullcandidateVarIndices- a collection of candidate variable indices; must be non-empty and contain valid indices within the bounds of the dataset- Throws:
IllegalArgumentException- if candidateVarIndices is empty or contains out-of-bound indicesNullPointerException- if candidateVarIndices is null
-
-
Method Details
-
setPenaltyDiscount
public void setPenaltyDiscount(double c) Sets the penalty discount value to the given value. The penalty discount is used internally to adjust the scoring criteria within the ScoredClusterFinder instance. This value plays a role in regulating how penalties are applied during the cluster finding process.- Parameters:
c- the penalty discount value to set; must be a valid double representing the desired penalty adjustment factor
-
setEbicGamma
public void setEbicGamma(double gamma) Sets the EBIC gamma parameter to the specified value. The gamma parameter is used in the extended Bayesian information criterion (EBIC) calculation to control the trade-off between goodness-of-fit and model complexity. Adjusting this value influences the selection of clusters by penalizing more complex models.- Parameters:
gamma- the EBIC gamma parameter value to set; must be a valid double representing the penalty adjustment factor
-
setRidge
public void setRidge(double ridge) Sets the ridge parameter to the given value. Ridge is typically used as a regularization term in optimization or statistical methods to control overfitting and enhance numerical stability.- Parameters:
ridge- the ridge parameter value to set; must be a non-negative double
-
setMargins
public void setMargins(double marginKm1, double marginKp1) Sets the margin values for the preceding and succeeding clusters. The margin values are constrained to be non-negative, and if a negative value is provided, it will be clamped to 0.0.- Parameters:
marginKm1- the margin value for the preceding cluster; must be a non-negative doublemarginKp1- the margin value for the succeeding cluster; must be a non-negative double
-
setVerbose
public void setVerbose(boolean verbose) Enables or disables verbose mode for this instance. When verbose mode is enabled, additional details or outputs may be provided to aid debugging or provide more information about the process.- Parameters:
verbose- a boolean value indicating whether verbose mode should be enabled (true) or disabled (false)
-
findClusters
Find all clusters of size 'size' inside Vsub whose RCCA-BIC score is maximized at rank 'targetRank' when contrasted with D = Vsub \ C. Returns hits sorted by (bestScore desc, lexicographic variable order).@- Parameters:
size- The size of the clusters.targetRank- The rank of the clusters.- Returns:
- The list of clusters found.
-