Class CausalUnmixer.Config

java.lang.Object
edu.cmu.tetrad.search.unmix.CausalUnmixer.Config
Enclosing class:
CausalUnmixer

public static class CausalUnmixer.Config extends Object
The Config class encapsulates the configuration settings used for the unmixing or clustering process within the CausalUnmixer framework. It provides parameters to control the behavior of the Expectation-Maximization (EM) algorithm, Gaussian Mixture Models (GMM), and additional related processes.

This class includes various tunable parameters, such as the number of clusters (K), parent superset configurations, regularization factors, covariance handling, annealing steps, and graph-related settings.

  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    double
    Starting temperature for the simulated annealing process.
    int
    Number of steps in the simulated annealing process.
    double
    Regularization parameter for covariance estimation.
    double
    Regularization parameter for covariance estimation.
    int
    Maximum number of iterations allowed for the Expectation-Maximization (EM) algorithm.
    int
    Safety margin for full sigma estimation.
    Specifies the number of clusters (K) used in the clustering process within the CausalUnmixer framework.
    int
    Specifies the minimum number of clusters (Kmin) that can be used in the clustering process within the CausalUnmixer framework.
    int
    The number of restarts to be performed by the k-means clustering algorithm.
    int
    Specifies the minimum number of clusters (Kmin) that can be used in the clustering process within the CausalUnmixer framework.
    double
    Represents the alpha parameter for controlling the significance level in hypothesis testing or statistical calculations.
    Defines the orientation style for the collider in the configuration.
    A function that generates a cluster-specific graph creation method.
    A function that generates a pooled graph model based on a given configuration and dataset.
    double
    A regularization parameter used in ridge regression to prevent overfitting by adding a penalty proportional to the square of the coefficients' magnitude.
    boolean
    Determines whether to use robust scaling for residuals in computations.
    Specifies the scoring type to be used when working with the parent superset configuration.
    int
    The maximum size of the superset to be considered during processing or configuration.
    boolean
    Indicates whether to use the parent superset configuration during the clustering or unmixing process within the CausalUnmixer framework.
  • Constructor Summary

    Constructors
    Constructor
    Description
    Default constructor for the Config class.
  • Method Summary

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • K

      public Integer K
      Specifies the number of clusters (K) used in the clustering process within the CausalUnmixer framework. This variable represents the target number of clusters that will be formed during the modeling process. It is a critical parameter for Gaussian Mixture Models (GMM) or other clustering algorithms applied. The default value is set to 2.
    • Kmin

      public int Kmin
      Specifies the minimum number of clusters (Kmin) that can be used in the clustering process within the CausalUnmixer framework. This variable defines a lower bound for cluster configurations during the modeling process, ensuring that at least this number of clusters is considered when applying algorithms such as Gaussian Mixture Models (GMM) or k-means.

      The default value is set to 1, indicating that the system will always consider at least one cluster.

    • Kmax

      public int Kmax
      Specifies the minimum number of clusters (Kmin) that can be used in the clustering process within the CausalUnmixer framework. This variable defines a lower bound for cluster configurations during the modeling process, ensuring that at least this number of clusters is considered when applying algorithms such as Gaussian Mixture Models (GMM) or k-means.

      The default value is set to 1, indicating that the system will always consider at least one cluster.

    • useParentSuperset

      public boolean useParentSuperset
      Indicates whether to use the parent superset configuration during the clustering or unmixing process within the CausalUnmixer framework. When set to true, the framework incorporates parent supersets into the modeling process, potentially influencing cluster assignments, causal decoding, or related processes. This setting impacts the generation or utilization of grouped parent structures in the modeling pipeline.

      The default value is true.

    • supersetTopM

      public int supersetTopM
      The maximum size of the superset to be considered during processing or configuration. Represents a threshold or limit to control the scope of operations in the superset.
    • supersetScore

      public ParentSupersetBuilder.ScoreType supersetScore
      Specifies the scoring type to be used when working with the parent superset configuration. This variable determines the method of evaluation or comparison within the parent superset context. The value is initialized to ParentSupersetBuilder.ScoreType.KENDALL.
    • robustScaleResiduals

      public boolean robustScaleResiduals
      Determines whether to use robust scaling for residuals in computations. When set to true, scaling methods that are less sensitive to outliers are applied, improving stability and accuracy in the presence of anomalous data.
    • kmeansRestarts

      public int kmeansRestarts
      The number of restarts to be performed by the k-means clustering algorithm. A higher value increases the chances of finding a better clustering solution by running the algorithm multiple times with different initializations.
    • emMaxIters

      public int emMaxIters
      Maximum number of iterations allowed for the Expectation-Maximization (EM) algorithm. This value determines the upper limit of iterations the EM algorithm can perform in its optimization process.
    • covRidgeRel

      public double covRidgeRel
      Regularization parameter for covariance estimation. Controls the amount of shrinkage applied to the covariance matrix during estimation.
    • covShrinkage

      public double covShrinkage
      Regularization parameter for covariance estimation. Controls the amount of shrinkage applied to the covariance matrix during estimation.
    • annealSteps

      public int annealSteps
      Number of steps in the simulated annealing process. Determines the number of iterations in the simulated annealing algorithm.
    • annealStartT

      public double annealStartT
      Starting temperature for the simulated annealing process. Controls the initial temperature level for the simulated annealing algorithm.
    • fullSigmaSafetyMargin

      public int fullSigmaSafetyMargin
      Safety margin for full sigma estimation. Ensures that the estimated covariance matrix is positive definite.
    • ridgeLambda

      public double ridgeLambda
      A regularization parameter used in ridge regression to prevent overfitting by adding a penalty proportional to the square of the coefficients' magnitude. Typically, higher values increase regularization, while lower values reduce it. This parameter helps stabilize solutions when dealing with multicollinearity or poorly conditioned problems.
    • pooledGraphFn

      A function that generates a pooled graph model based on a given configuration and dataset. This variable represents a high-level mapping from a configuration object to a secondary function, which further maps a dataset to a graph representation.

      The function chain allows flexible composition of graph creation pipelines tailored to different configurations and input datasets. It potentially leverages model pooling or other aggregation mechanisms.

      The purpose of this variable is to encapsulate the logic needed to derive a pooled graph representation, enabling modularity and reusability.

      Expected to be used in scenarios where graph structure modeling from datasets is required under varying parameterized configurations.

      The resulting graph may involve structures influenced by clustering effects, covariance matrix adjustments, or statistical aggregation across dataset features, based on the particular configuration provided.

    • perClusterGraphFn

      public Function<CausalUnmixer.Config,Function<DataSet,Graph>> perClusterGraphFn
      A function that generates a cluster-specific graph creation method.

      This variable is a higher-order function that takes a Config object as input and produces a function. The resulting function, in turn, takes a DataSet as input and outputs a Graph object. The purpose of perClusterGraphFn is to allow for the creation of graphs tailored to the specific characteristics of different clusters, enabling more customized and effective processing or representation of data.

      The exact behavior of the function is determined by the configuration provided in the Config object. This allows for flexibility in defining how cluster-specific graphs should be generated based on varying use cases or requirements.

    • pcAlpha

      public double pcAlpha
      Represents the alpha parameter for controlling the significance level in hypothesis testing or statistical calculations.

      Typically used to determine the threshold below which a null hypothesis can be rejected. A smaller value indicates a more stringent significance level.

    • pcColliderStyle

      public Pc.ColliderOrientationStyle pcColliderStyle
      Defines the orientation style for the collider in the configuration. Specifically, this variable determines how the collider's orientation is calculated or interpreted. The value is set to Pc.ColliderOrientationStyle.MAX_P by default, indicating the use of the MAX_P style.
  • Constructor Details

    • Config

      public Config()
      Default constructor for the Config class. Initializes an instance of the Config class with default values set for its fields. This constructor does not take any parameters and is primarily used for creating a Config object with default settings.