edu.cmu.tetrad.search.unmix.CausalUnmixer.Config

Enclosing class:: CausalUnmixer

public static class CausalUnmixer.Config extends Object

The Config class encapsulates the configuration settings used for the unmixing or clustering process within the CausalUnmixer framework. It provides parameters to control the behavior of the Expectation-Maximization (EM) algorithm, Gaussian Mixture Models (GMM), and additional related processes.

This class includes various tunable parameters, such as the number of clusters (K), parent superset configurations, regularization factors, covariance handling, annealing steps, and graph-related settings.

Field Summary

Fields

Modifier and Type

Field

Description

double

annealStartT

Starting temperature for the simulated annealing process.

int

annealSteps

Number of steps in the simulated annealing process.

double

covRidgeRel

Regularization parameter for covariance estimation.

double

covShrinkage

Regularization parameter for covariance estimation.

int

emMaxIters

Maximum number of iterations allowed for the Expectation-Maximization (EM) algorithm.

int

fullSigmaSafetyMargin

Safety margin for full sigma estimation.

Integer

K

Specifies the number of clusters (K) used in the clustering process within the CausalUnmixer framework.

int

Kmax

Specifies the minimum number of clusters (Kmin) that can be used in the clustering process within the CausalUnmixer framework.

int

kmeansRestarts

The number of restarts to be performed by the k-means clustering algorithm.

int

Kmin

Specifies the minimum number of clusters (Kmin) that can be used in the clustering process within the CausalUnmixer framework.

double

pcAlpha

Represents the alpha parameter for controlling the significance level in hypothesis testing or statistical calculations.

Pc.ColliderOrientationStyle

pcColliderStyle

Defines the orientation style for the collider in the configuration.

Function<CausalUnmixer.Config,Function<DataSet,Graph>>

perClusterGraphFn

A function that generates a cluster-specific graph creation method.

Function<CausalUnmixer.Config,Function<DataSet,Graph>>

pooledGraphFn

A function that generates a pooled graph model based on a given configuration and dataset.

double

ridgeLambda

A regularization parameter used in ridge regression to prevent overfitting by adding a penalty proportional to the square of the coefficients' magnitude.

boolean

robustScaleResiduals

Determines whether to use robust scaling for residuals in computations.

ParentSupersetBuilder.ScoreType

supersetScore

Specifies the scoring type to be used when working with the parent superset configuration.

int

supersetTopM

The maximum size of the superset to be considered during processing or configuration.

boolean

useParentSuperset

Indicates whether to use the parent superset configuration during the clustering or unmixing process within the CausalUnmixer framework.
Constructor Summary

Constructors

Constructor

Description

Config()

Default constructor for the Config class.
Method Summary

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- K
  
  public Integer K
  
  Specifies the number of clusters (K) used in the clustering process within the CausalUnmixer framework. This variable represents the target number of clusters that will be formed during the modeling process. It is a critical parameter for Gaussian Mixture Models (GMM) or other clustering algorithms applied. The default value is set to 2.
- Kmin
  
  public int Kmin
  
  Specifies the minimum number of clusters (Kmin) that can be used in the clustering process within the CausalUnmixer framework. This variable defines a lower bound for cluster configurations during the modeling process, ensuring that at least this number of clusters is considered when applying algorithms such as Gaussian Mixture Models (GMM) or k-means.
  The default value is set to 1, indicating that the system will always consider at least one cluster.
- Kmax
  
  public int Kmax
  
  Specifies the minimum number of clusters (Kmin) that can be used in the clustering process within the CausalUnmixer framework. This variable defines a lower bound for cluster configurations during the modeling process, ensuring that at least this number of clusters is considered when applying algorithms such as Gaussian Mixture Models (GMM) or k-means.
  The default value is set to 1, indicating that the system will always consider at least one cluster.
- useParentSuperset
  
  public boolean useParentSuperset
  
  Indicates whether to use the parent superset configuration during the clustering or unmixing process within the CausalUnmixer framework. When set to true, the framework incorporates parent supersets into the modeling process, potentially influencing cluster assignments, causal decoding, or related processes. This setting impacts the generation or utilization of grouped parent structures in the modeling pipeline.
  The default value is true.
- supersetTopM
  
  public int supersetTopM
  
  The maximum size of the superset to be considered during processing or configuration. Represents a threshold or limit to control the scope of operations in the superset.
- supersetScore
  
  public ParentSupersetBuilder.ScoreType supersetScore
  
  Specifies the scoring type to be used when working with the parent superset configuration. This variable determines the method of evaluation or comparison within the parent superset context. The value is initialized to ParentSupersetBuilder.ScoreType.KENDALL.
- robustScaleResiduals
  
  public boolean robustScaleResiduals
  
  Determines whether to use robust scaling for residuals in computations. When set to true, scaling methods that are less sensitive to outliers are applied, improving stability and accuracy in the presence of anomalous data.
- kmeansRestarts
  
  public int kmeansRestarts
  
  The number of restarts to be performed by the k-means clustering algorithm. A higher value increases the chances of finding a better clustering solution by running the algorithm multiple times with different initializations.
- emMaxIters
  
  public int emMaxIters
  
  Maximum number of iterations allowed for the Expectation-Maximization (EM) algorithm. This value determines the upper limit of iterations the EM algorithm can perform in its optimization process.
- covRidgeRel
  
  public double covRidgeRel
  
  Regularization parameter for covariance estimation. Controls the amount of shrinkage applied to the covariance matrix during estimation.
- covShrinkage
  
  public double covShrinkage
  
  Regularization parameter for covariance estimation. Controls the amount of shrinkage applied to the covariance matrix during estimation.
- annealSteps
  
  public int annealSteps
  
  Number of steps in the simulated annealing process. Determines the number of iterations in the simulated annealing algorithm.
- annealStartT
  
  public double annealStartT
  
  Starting temperature for the simulated annealing process. Controls the initial temperature level for the simulated annealing algorithm.
- fullSigmaSafetyMargin
  
  public int fullSigmaSafetyMargin
  
  Safety margin for full sigma estimation. Ensures that the estimated covariance matrix is positive definite.
- ridgeLambda
  
  public double ridgeLambda
  
  A regularization parameter used in ridge regression to prevent overfitting by adding a penalty proportional to the square of the coefficients' magnitude. Typically, higher values increase regularization, while lower values reduce it. This parameter helps stabilize solutions when dealing with multicollinearity or poorly conditioned problems.
- pooledGraphFn
  
  public Function<CausalUnmixer.Config,Function<DataSet,Graph>> pooledGraphFn
  
  A function that generates a pooled graph model based on a given configuration and dataset. This variable represents a high-level mapping from a configuration object to a secondary function, which further maps a dataset to a graph representation.
  The function chain allows flexible composition of graph creation pipelines tailored to different configurations and input datasets. It potentially leverages model pooling or other aggregation mechanisms.
  The purpose of this variable is to encapsulate the logic needed to derive a pooled graph representation, enabling modularity and reusability.
  Expected to be used in scenarios where graph structure modeling from datasets is required under varying parameterized configurations.
  The resulting graph may involve structures influenced by clustering effects, covariance matrix adjustments, or statistical aggregation across dataset features, based on the particular configuration provided.
- perClusterGraphFn
  
  public Function<CausalUnmixer.Config,Function<DataSet,Graph>> perClusterGraphFn
  
  A function that generates a cluster-specific graph creation method.
  This variable is a higher-order function that takes a Config object as input and produces a function. The resulting function, in turn, takes a DataSet as input and outputs a Graph object. The purpose of perClusterGraphFn is to allow for the creation of graphs tailored to the specific characteristics of different clusters, enabling more customized and effective processing or representation of data.
  The exact behavior of the function is determined by the configuration provided in the Config object. This allows for flexibility in defining how cluster-specific graphs should be generated based on varying use cases or requirements.
- pcAlpha
  
  public double pcAlpha
  
  Represents the alpha parameter for controlling the significance level in hypothesis testing or statistical calculations.
  Typically used to determine the threshold below which a null hypothesis can be rejected. A smaller value indicates a more stringent significance level.
- pcColliderStyle
  
  public Pc.ColliderOrientationStyle pcColliderStyle
  
  Defines the orientation style for the collider in the configuration. Specifically, this variable determines how the collider's orientation is calculated or interpreted. The value is set to Pc.ColliderOrientationStyle.MAX_P by default, indicating the use of the MAX_P style.
Constructor Details
- Config
  
  public Config()
  
  Default constructor for the Config class. Initializes an instance of the Config class with default values set for its fields. This constructor does not take any parameters and is primarily used for creating a Config object with default settings.

Class CausalUnmixer.Config

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

K

Kmin

Kmax

useParentSuperset

supersetTopM

supersetScore

robustScaleResiduals

kmeansRestarts

emMaxIters

covRidgeRel

covShrinkage

annealSteps

annealStartT

fullSigmaSafetyMargin

ridgeLambda

pooledGraphFn

perClusterGraphFn

pcAlpha

pcColliderStyle

Constructor Details

Config