Class ScoredClusterFinder

java.lang.Object
edu.cmu.tetrad.search.ScoredClusterFinder

public final class ScoredClusterFinder extends Object
ScoredClusterFinder ------------------- Given a DataSet and a subset of candidate variables Vsub (by column index), enumerate all clusters C ⊆ Vsub of a fixed size s and keep those for which a BIC-style RCCA score is maximized exactly at rank k when scored against D = Vsub \ C.

Scoring model (same spirit as BlocksBicScore): Fit(r) = -nEff * sum_{i=1..r} log(1 - rho_i^2) Pen(r) = c * [ r * (p + q - r) ] * log(n) + 2*gamma * [ r * (p + q - r) ] * log(P_pool) where p = |C|, q = |D|, m = min(p,q,n-1), r ∈ {0..m}, and nEff = max(1, n - 1 - (p + q + 1)/2). We pick r* that maximizes Fit(r) - Pen(r).

A cluster is accepted if r* == targetRank and, optionally, has margins over r*±1.

Thread-safe; uses parallel enumeration and lock-free collections.

  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static final class 
    Result holder for one accepted cluster.
  • Constructor Summary

    Constructors
    Constructor
    Description
    ScoredClusterFinder(DataSet dataSet, Collection<Integer> candidateVarIndices)
    Constructs a ScoredClusterFinder instance using the provided dataset and a collection of candidate variable indices.
  • Method Summary

    Modifier and Type
    Method
    Description
    findClusters(int size, int targetRank)
    Find all clusters of size 'size' inside Vsub whose RCCA-BIC score is maximized at rank 'targetRank' when contrasted with D = Vsub \ C.
    void
    setEbicGamma(double gamma)
    Sets the EBIC gamma parameter to the specified value.
    void
    setMargins(double marginKm1, double marginKp1)
    Sets the margin values for the preceding and succeeding clusters.
    void
    Sets the penalty discount value to the given value.
    void
    setRidge(double ridge)
    Sets the ridge parameter to the given value.
    void
    setVerbose(boolean verbose)
    Enables or disables verbose mode for this instance.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • ScoredClusterFinder

      public ScoredClusterFinder(DataSet dataSet, Collection<Integer> candidateVarIndices)
      Constructs a ScoredClusterFinder instance using the provided dataset and a collection of candidate variable indices. This process initializes a correlation matrix, ensures the validity of variable indices, and organizes the indices in a deterministic order.
      Parameters:
      dataSet - the dataset from which the correlation matrix is derived; must not be null
      candidateVarIndices - a collection of candidate variable indices; must be non-empty and contain valid indices within the bounds of the dataset
      Throws:
      IllegalArgumentException - if candidateVarIndices is empty or contains out-of-bound indices
      NullPointerException - if candidateVarIndices is null
  • Method Details

    • setPenaltyDiscount

      public void setPenaltyDiscount(double c)
      Sets the penalty discount value to the given value. The penalty discount is used internally to adjust the scoring criteria within the ScoredClusterFinder instance. This value plays a role in regulating how penalties are applied during the cluster finding process.
      Parameters:
      c - the penalty discount value to set; must be a valid double representing the desired penalty adjustment factor
    • setEbicGamma

      public void setEbicGamma(double gamma)
      Sets the EBIC gamma parameter to the specified value. The gamma parameter is used in the extended Bayesian information criterion (EBIC) calculation to control the trade-off between goodness-of-fit and model complexity. Adjusting this value influences the selection of clusters by penalizing more complex models.
      Parameters:
      gamma - the EBIC gamma parameter value to set; must be a valid double representing the penalty adjustment factor
    • setRidge

      public void setRidge(double ridge)
      Sets the ridge parameter to the given value. Ridge is typically used as a regularization term in optimization or statistical methods to control overfitting and enhance numerical stability.
      Parameters:
      ridge - the ridge parameter value to set; must be a non-negative double
    • setMargins

      public void setMargins(double marginKm1, double marginKp1)
      Sets the margin values for the preceding and succeeding clusters. The margin values are constrained to be non-negative, and if a negative value is provided, it will be clamped to 0.0.
      Parameters:
      marginKm1 - the margin value for the preceding cluster; must be a non-negative double
      marginKp1 - the margin value for the succeeding cluster; must be a non-negative double
    • setVerbose

      public void setVerbose(boolean verbose)
      Enables or disables verbose mode for this instance. When verbose mode is enabled, additional details or outputs may be provided to aid debugging or provide more information about the process.
      Parameters:
      verbose - a boolean value indicating whether verbose mode should be enabled (true) or disabled (false)
    • findClusters

      public List<ScoredClusterFinder.ClusterHit> findClusters(int size, int targetRank)
      Find all clusters of size 'size' inside Vsub whose RCCA-BIC score is maximized at rank 'targetRank' when contrasted with D = Vsub \ C. Returns hits sorted by (bestScore desc, lexicographic variable order).@
      Parameters:
      size - The size of the clusters.
      targetRank - The rank of the clusters.
      Returns:
      The list of clusters found.