Class GaussianMixtureEM.Model
- Enclosing class:
GaussianMixtureEM
The parameters of the GMM include the number of components (K), the dimensionality of the data, the component weights, the mean vectors, and the covariance matrices. The type of covariance matrices is also specified (e.g., full or diagonal).
-
Field Summary
FieldsModifier and TypeFieldDescriptionfinal double[][][]The covariance matrices of the mixture components in the Gaussian Mixture Model (GMM).Specifies the type of covariance matrix used in the Gaussian Mixture Model (GMM).final intRepresents the dimensionality of the data in the Gaussian Mixture Model (GMM).final intRepresents the number of mixture components in the Gaussian Mixture Model (GMM).final doubleRepresents the overall log-likelihood of the Gaussian Mixture Model (GMM).final double[][]The mean vectors of the mixture components in the Gaussian Mixture Model (GMM).final double[][]A two-dimensional array representing the responsibilities or soft cluster assignments for each data point in the Gaussian Mixture Model (GMM).final double[]The weights for each mixture component in the Gaussian Mixture Model (GMM). -
Constructor Summary
ConstructorsConstructorDescriptionModel(int K, int d, GaussianMixtureEM.CovarianceType covType, double[] w, double[][] mu, double[][][] covs, double ll, double[][] resp) Constructs an instance of the Gaussian Mixture Model (GMM) with the specified parameters. -
Method Summary
Modifier and TypeMethodDescriptiondoublebic(int n) Computes the Bayesian Information Criterion (BIC) for the Gaussian Mixture Model (GMM) given the number of rows used to fit the model.
-
Field Details
-
K
public final int KRepresents the number of mixture components in the Gaussian Mixture Model (GMM).This variable defines the distinct number of clusters or distributions that comprise the overall mixture model. Each component is characterized by its own set of parameters, such as mean, covariance, and weight, and contributes to the overall probability density represented by the model.
K is a critical parameter in the construction and evaluation of the GMM and directly impacts the model's complexity and its ability to fit the training data.
-
d
public final int dRepresents the dimensionality of the data in the Gaussian Mixture Model (GMM).This value defines the number of features or dimensions in the data that the model is designed to handle. It is specified during the initialization of the model and remains constant throughout its lifecycle.
-
weights
public final double[] weightsThe weights for each mixture component in the Gaussian Mixture Model (GMM).This array specifies the mixing weights associated with the components of the model. The size of the array corresponds to the number of mixture components (K) in the model. Each weight represents the proportional contribution of a particular component to the overall model and must satisfy the condition that the weights sum to 1.
-
means
public final double[][] meansThe mean vectors of the mixture components in the Gaussian Mixture Model (GMM).Each row of the array represents the mean vector of a single mixture component, and the number of rows corresponds to the number of mixture components (K). Each row's length corresponds to the dimensionality of the data (d). This parameter defines the center of the clusters formed by each component in the feature space.
Dimensions: K x d.
-
covs
public final double[][][] covsThe covariance matrices of the mixture components in the Gaussian Mixture Model (GMM).This is a three-dimensional array with dimensions
K × d × d, where:- K represents the number of mixture components.
- d represents the dimensionality of the data.
The structure of the covariance values depends on the covariance type (
covType):- If
covTypeis FULL, each covariance matrix is of sized × dand stored in full in this array. - If
covTypeis DIAGONAL, only the diagonal elements of each covariance matrix are stored, and they are represented as[k][j][0], wherekis the component index andjis the dimension.
-
covType
Specifies the type of covariance matrix used in the Gaussian Mixture Model (GMM).The covariance type determines the structure of the covariance matrices for the mixture components. It can either be FULL, where each component has its own full covariance matrix, or DIAGONAL, where each component has a diagonal covariance matrix that assumes uncorrelated features.
The choice of covariance type affects the model's flexibility and computational complexity.
-
logLikelihood
public final double logLikelihoodRepresents the overall log-likelihood of the Gaussian Mixture Model (GMM).The log-likelihood is a measure of how well the model explains the given data, with higher values indicating a better fit. It is computed based on the model parameters (e.g., weights, means, and covariance matrices) and the observed data.
This value is commonly used for tasks such as model evaluation and comparison, and it may also serve as input to model selection criteria.
-
responsibilities
public final double[][] responsibilitiesA two-dimensional array representing the responsibilities or soft cluster assignments for each data point in the Gaussian Mixture Model (GMM).Each row corresponds to a data point, and each column corresponds to a mixture component. The value at position
[i][j]represents the responsibility (i.e., the probability) of thej-th mixture component for thei-th data point.Dimensions:
- n – Number of data points.
- K – Number of mixture components.
-
-
Constructor Details
-
Model
public Model(int K, int d, GaussianMixtureEM.CovarianceType covType, double[] w, double[][] mu, double[][][] covs, double ll, double[][] resp) Constructs an instance of the Gaussian Mixture Model (GMM) with the specified parameters.- Parameters:
K- the number of mixture componentsd- the dimensionality of the datacovType- the type of covariance matrix used (FULL or DIAGONAL)w- the weights of the mixture componentsmu- the mean vectors of the mixture componentscovs- the covariance matrices of the mixture componentsll- the overall log-likelihood of the modelresp- the responsibilities (soft cluster assignments) for each data point
-
-
Method Details
-
bic
public double bic(int n) Computes the Bayesian Information Criterion (BIC) for the Gaussian Mixture Model (GMM) given the number of rows used to fit the model.The BIC is a criterion used for model selection based on the trade-off between model fit (log-likelihood) and model complexity (number of parameters). A lower BIC indicates a better balance between fit and complexity.
- Parameters:
n- the number of rows used to fit the model (i.e., the sample size)- Returns:
- the Bayesian Information Criterion (BIC) value
-