Class GaussianMixtureEM.Model

java.lang.Object
edu.cmu.tetrad.search.unmix.GaussianMixtureEM.Model
Enclosing class:
GaussianMixtureEM

public static final class GaussianMixtureEM.Model extends Object
Represents the Gaussian Mixture Model (GMM) computed and used in the GaussianMixtureEM class. This class encapsulates the parameters of the GMM, as well as related information such as soft cluster assignments (responsibilities) and the model's overall log-likelihood.

The parameters of the GMM include the number of components (K), the dimensionality of the data, the component weights, the mean vectors, and the covariance matrices. The type of covariance matrices is also specified (e.g., full or diagonal).

  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    final double[][][]
    The covariance matrices of the mixture components in the Gaussian Mixture Model (GMM).
    Specifies the type of covariance matrix used in the Gaussian Mixture Model (GMM).
    final int
    Represents the dimensionality of the data in the Gaussian Mixture Model (GMM).
    final int
    Represents the number of mixture components in the Gaussian Mixture Model (GMM).
    final double
    Represents the overall log-likelihood of the Gaussian Mixture Model (GMM).
    final double[][]
    The mean vectors of the mixture components in the Gaussian Mixture Model (GMM).
    final double[][]
    A two-dimensional array representing the responsibilities or soft cluster assignments for each data point in the Gaussian Mixture Model (GMM).
    final double[]
    The weights for each mixture component in the Gaussian Mixture Model (GMM).
  • Constructor Summary

    Constructors
    Constructor
    Description
    Model(int K, int d, GaussianMixtureEM.CovarianceType covType, double[] w, double[][] mu, double[][][] covs, double ll, double[][] resp)
    Constructs an instance of the Gaussian Mixture Model (GMM) with the specified parameters.
  • Method Summary

    Modifier and Type
    Method
    Description
    double
    bic(int n)
    Computes the Bayesian Information Criterion (BIC) for the Gaussian Mixture Model (GMM) given the number of rows used to fit the model.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • K

      public final int K
      Represents the number of mixture components in the Gaussian Mixture Model (GMM).

      This variable defines the distinct number of clusters or distributions that comprise the overall mixture model. Each component is characterized by its own set of parameters, such as mean, covariance, and weight, and contributes to the overall probability density represented by the model.

      K is a critical parameter in the construction and evaluation of the GMM and directly impacts the model's complexity and its ability to fit the training data.

    • d

      public final int d
      Represents the dimensionality of the data in the Gaussian Mixture Model (GMM).

      This value defines the number of features or dimensions in the data that the model is designed to handle. It is specified during the initialization of the model and remains constant throughout its lifecycle.

    • weights

      public final double[] weights
      The weights for each mixture component in the Gaussian Mixture Model (GMM).

      This array specifies the mixing weights associated with the components of the model. The size of the array corresponds to the number of mixture components (K) in the model. Each weight represents the proportional contribution of a particular component to the overall model and must satisfy the condition that the weights sum to 1.

    • means

      public final double[][] means
      The mean vectors of the mixture components in the Gaussian Mixture Model (GMM).

      Each row of the array represents the mean vector of a single mixture component, and the number of rows corresponds to the number of mixture components (K). Each row's length corresponds to the dimensionality of the data (d). This parameter defines the center of the clusters formed by each component in the feature space.

      Dimensions: K x d.

    • covs

      public final double[][][] covs
      The covariance matrices of the mixture components in the Gaussian Mixture Model (GMM).

      This is a three-dimensional array with dimensions K × d × d, where:

      • K represents the number of mixture components.
      • d represents the dimensionality of the data.

      The structure of the covariance values depends on the covariance type (covType):

      • If covType is FULL, each covariance matrix is of size d × d and stored in full in this array.
      • If covType is DIAGONAL, only the diagonal elements of each covariance matrix are stored, and they are represented as [k][j][0], where k is the component index and j is the dimension.
    • covType

      public final GaussianMixtureEM.CovarianceType covType
      Specifies the type of covariance matrix used in the Gaussian Mixture Model (GMM).

      The covariance type determines the structure of the covariance matrices for the mixture components. It can either be FULL, where each component has its own full covariance matrix, or DIAGONAL, where each component has a diagonal covariance matrix that assumes uncorrelated features.

      The choice of covariance type affects the model's flexibility and computational complexity.

    • logLikelihood

      public final double logLikelihood
      Represents the overall log-likelihood of the Gaussian Mixture Model (GMM).

      The log-likelihood is a measure of how well the model explains the given data, with higher values indicating a better fit. It is computed based on the model parameters (e.g., weights, means, and covariance matrices) and the observed data.

      This value is commonly used for tasks such as model evaluation and comparison, and it may also serve as input to model selection criteria.

    • responsibilities

      public final double[][] responsibilities
      A two-dimensional array representing the responsibilities or soft cluster assignments for each data point in the Gaussian Mixture Model (GMM).

      Each row corresponds to a data point, and each column corresponds to a mixture component. The value at position [i][j] represents the responsibility (i.e., the probability) of the j-th mixture component for the i-th data point.

      Dimensions:

      • n – Number of data points.
      • K – Number of mixture components.
  • Constructor Details

    • Model

      public Model(int K, int d, GaussianMixtureEM.CovarianceType covType, double[] w, double[][] mu, double[][][] covs, double ll, double[][] resp)
      Constructs an instance of the Gaussian Mixture Model (GMM) with the specified parameters.
      Parameters:
      K - the number of mixture components
      d - the dimensionality of the data
      covType - the type of covariance matrix used (FULL or DIAGONAL)
      w - the weights of the mixture components
      mu - the mean vectors of the mixture components
      covs - the covariance matrices of the mixture components
      ll - the overall log-likelihood of the model
      resp - the responsibilities (soft cluster assignments) for each data point
  • Method Details

    • bic

      public double bic(int n)
      Computes the Bayesian Information Criterion (BIC) for the Gaussian Mixture Model (GMM) given the number of rows used to fit the model.

      The BIC is a criterion used for model selection based on the trade-off between model fit (log-likelihood) and model complexity (number of parameters). A lower BIC indicates a better balance between fit and complexity.

      Parameters:
      n - the number of rows used to fit the model (i.e., the sample size)
      Returns:
      the Bayesian Information Criterion (BIC) value