Class CausalUnmixer.Config
- Enclosing class:
CausalUnmixer
Config class encapsulates the configuration settings used for the unmixing or clustering process
within the CausalUnmixer framework. It provides parameters to control the behavior of the
Expectation-Maximization (EM) algorithm, Gaussian Mixture Models (GMM), and additional related processes.
This class includes various tunable parameters, such as the number of clusters (K), parent superset configurations, regularization factors, covariance handling, annealing steps, and graph-related settings.
-
Field Summary
FieldsModifier and TypeFieldDescriptiondoubleStarting temperature for the simulated annealing process.intNumber of steps in the simulated annealing process.doubleRegularization parameter for covariance estimation.doubleRegularization parameter for covariance estimation.intMaximum number of iterations allowed for the Expectation-Maximization (EM) algorithm.intSafety margin for full sigma estimation.Specifies the number of clusters (K) used in the clustering process within theCausalUnmixerframework.intSpecifies the minimum number of clusters (Kmin) that can be used in the clustering process within theCausalUnmixerframework.intThe number of restarts to be performed by the k-means clustering algorithm.intSpecifies the minimum number of clusters (Kmin) that can be used in the clustering process within theCausalUnmixerframework.doubleRepresents the alpha parameter for controlling the significance level in hypothesis testing or statistical calculations.Defines the orientation style for the collider in the configuration.A function that generates a cluster-specific graph creation method.A function that generates a pooled graph model based on a given configuration and dataset.doubleA regularization parameter used in ridge regression to prevent overfitting by adding a penalty proportional to the square of the coefficients' magnitude.booleanDetermines whether to use robust scaling for residuals in computations.Specifies the scoring type to be used when working with the parent superset configuration.intThe maximum size of the superset to be considered during processing or configuration.booleanIndicates whether to use the parent superset configuration during the clustering or unmixing process within theCausalUnmixerframework. -
Constructor Summary
Constructors -
Method Summary
-
Field Details
-
K
Specifies the number of clusters (K) used in the clustering process within theCausalUnmixerframework. This variable represents the target number of clusters that will be formed during the modeling process. It is a critical parameter for Gaussian Mixture Models (GMM) or other clustering algorithms applied. The default value is set to 2. -
Kmin
public int KminSpecifies the minimum number of clusters (Kmin) that can be used in the clustering process within theCausalUnmixerframework. This variable defines a lower bound for cluster configurations during the modeling process, ensuring that at least this number of clusters is considered when applying algorithms such as Gaussian Mixture Models (GMM) or k-means.The default value is set to 1, indicating that the system will always consider at least one cluster.
-
Kmax
public int KmaxSpecifies the minimum number of clusters (Kmin) that can be used in the clustering process within theCausalUnmixerframework. This variable defines a lower bound for cluster configurations during the modeling process, ensuring that at least this number of clusters is considered when applying algorithms such as Gaussian Mixture Models (GMM) or k-means.The default value is set to 1, indicating that the system will always consider at least one cluster.
-
useParentSuperset
public boolean useParentSupersetIndicates whether to use the parent superset configuration during the clustering or unmixing process within theCausalUnmixerframework. When set totrue, the framework incorporates parent supersets into the modeling process, potentially influencing cluster assignments, causal decoding, or related processes. This setting impacts the generation or utilization of grouped parent structures in the modeling pipeline.The default value is
true. -
supersetTopM
public int supersetTopMThe maximum size of the superset to be considered during processing or configuration. Represents a threshold or limit to control the scope of operations in the superset. -
supersetScore
Specifies the scoring type to be used when working with the parent superset configuration. This variable determines the method of evaluation or comparison within the parent superset context. The value is initialized toParentSupersetBuilder.ScoreType.KENDALL. -
robustScaleResiduals
public boolean robustScaleResidualsDetermines whether to use robust scaling for residuals in computations. When set to true, scaling methods that are less sensitive to outliers are applied, improving stability and accuracy in the presence of anomalous data. -
kmeansRestarts
public int kmeansRestartsThe number of restarts to be performed by the k-means clustering algorithm. A higher value increases the chances of finding a better clustering solution by running the algorithm multiple times with different initializations. -
emMaxIters
public int emMaxItersMaximum number of iterations allowed for the Expectation-Maximization (EM) algorithm. This value determines the upper limit of iterations the EM algorithm can perform in its optimization process. -
covRidgeRel
public double covRidgeRelRegularization parameter for covariance estimation. Controls the amount of shrinkage applied to the covariance matrix during estimation. -
covShrinkage
public double covShrinkageRegularization parameter for covariance estimation. Controls the amount of shrinkage applied to the covariance matrix during estimation. -
annealSteps
public int annealStepsNumber of steps in the simulated annealing process. Determines the number of iterations in the simulated annealing algorithm. -
annealStartT
public double annealStartTStarting temperature for the simulated annealing process. Controls the initial temperature level for the simulated annealing algorithm. -
fullSigmaSafetyMargin
public int fullSigmaSafetyMarginSafety margin for full sigma estimation. Ensures that the estimated covariance matrix is positive definite. -
ridgeLambda
public double ridgeLambdaA regularization parameter used in ridge regression to prevent overfitting by adding a penalty proportional to the square of the coefficients' magnitude. Typically, higher values increase regularization, while lower values reduce it. This parameter helps stabilize solutions when dealing with multicollinearity or poorly conditioned problems. -
pooledGraphFn
A function that generates a pooled graph model based on a given configuration and dataset. This variable represents a high-level mapping from a configuration object to a secondary function, which further maps a dataset to a graph representation.The function chain allows flexible composition of graph creation pipelines tailored to different configurations and input datasets. It potentially leverages model pooling or other aggregation mechanisms.
The purpose of this variable is to encapsulate the logic needed to derive a pooled graph representation, enabling modularity and reusability.
Expected to be used in scenarios where graph structure modeling from datasets is required under varying parameterized configurations.
The resulting graph may involve structures influenced by clustering effects, covariance matrix adjustments, or statistical aggregation across dataset features, based on the particular configuration provided.
-
perClusterGraphFn
A function that generates a cluster-specific graph creation method.This variable is a higher-order function that takes a
Configobject as input and produces a function. The resulting function, in turn, takes aDataSetas input and outputs aGraphobject. The purpose ofperClusterGraphFnis to allow for the creation of graphs tailored to the specific characteristics of different clusters, enabling more customized and effective processing or representation of data.The exact behavior of the function is determined by the configuration provided in the
Configobject. This allows for flexibility in defining how cluster-specific graphs should be generated based on varying use cases or requirements. -
pcAlpha
public double pcAlphaRepresents the alpha parameter for controlling the significance level in hypothesis testing or statistical calculations.Typically used to determine the threshold below which a null hypothesis can be rejected. A smaller value indicates a more stringent significance level.
-
pcColliderStyle
Defines the orientation style for the collider in the configuration. Specifically, this variable determines how the collider's orientation is calculated or interpreted. The value is set to Pc.ColliderOrientationStyle.MAX_P by default, indicating the use of the MAX_P style.
-
-
Constructor Details
-
Config
public Config()Default constructor for the Config class. Initializes an instance of the Config class with default values set for its fields. This constructor does not take any parameters and is primarily used for creating a Config object with default settings.
-