edu.cmu.tetrad.search.unmix.KMeans

public final class KMeans extends Object

Implements the K-Means clustering algorithm using the k-means++ initialization method and iterative refinement. The algorithm partitions a dataset into a specified number of clusters by minimizing the sum of squared distances between points and their respective cluster centroids.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

KMeans.Result

Represents the result of a clustering operation using the KMeans algorithm.
Method Summary

Modifier and Type

Method

Description

static KMeans.Result

cluster(double[][] X, int K, int maxIter, long seed)

Performs k-means clustering on a given dataset.

static KMeans.Result

clusterWithRestarts(double[][] X, int K, int maxIter, long seed, int restarts)

Performs k-means clustering with multiple restarts and selects the best result based on the lowest within-cluster sum of squared errors (SSE).

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Method Details
- cluster
  
  public static KMeans.Result cluster(double[][] X, int K, int maxIter, long seed)
  
  Performs k-means clustering on a given dataset.
  
  Parameters:
  
  X - The data points to be clustered, represented as a 2D array where each row corresponds to a data point and each column corresponds to a feature. The dataset must be non-null.
  
  K - The number of clusters to create. If K is greater than the number of data points, it will be adjusted to the number of data points. K must be a positive integer.
  
  maxIter - The maximum number of iterations the algorithm will run. Must be a positive integer.
  
  seed - The seed for the random number generator used to initialize the cluster centroids.
  
  Returns:
  
  An instance of the Result class containing the cluster assignments (labels) for each data point and the coordinates of the centroids of the clusters.
- clusterWithRestarts
  
  public static KMeans.Result clusterWithRestarts(double[][] X, int K, int maxIter, long seed, int restarts)
  
  Performs k-means clustering with multiple restarts and selects the best result based on the lowest within-cluster sum of squared errors (SSE).
  
  Parameters:
  
  X - The data points to be clustered, represented as a 2D array where each row corresponds to a data point and each column corresponds to a feature. The dataset must be non-null.
  
  K - The number of clusters to create. If K is greater than the number of data points, it will be adjusted to the number of data points. K must be a positive integer.
  
  maxIter - The maximum number of iterations the algorithm will run. Must be a positive integer.
  
  seed - The seed for the random number generator used to initialize the cluster centroids for each restart.
  
  restarts - The number of times to restart the clustering process with different initial centroids. If set to a value less than or equal to 0, the clustering algorithm will run only once.
  
  Returns:
  
  An instance of the Result class containing the cluster assignments (labels) for each data point and the coordinates of the centroids of the clusters from the best run (based on the lowest within-cluster SSE).

Class KMeans

Nested Class Summary

Method Summary

Methods inherited from class java.lang.Object

Method Details

cluster

clusterWithRestarts