Class KMeans

java.lang.Object
edu.cmu.tetrad.search.unmix.KMeans

public final class KMeans extends Object
Implements the K-Means clustering algorithm using the k-means++ initialization method and iterative refinement. The algorithm partitions a dataset into a specified number of clusters by minimizing the sum of squared distances between points and their respective cluster centroids.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static class 
    Represents the result of a clustering operation using the KMeans algorithm.
  • Method Summary

    Modifier and Type
    Method
    Description
    cluster(double[][] X, int K, int maxIter, long seed)
    Performs k-means clustering on a given dataset.
    clusterWithRestarts(double[][] X, int K, int maxIter, long seed, int restarts)
    Performs k-means clustering with multiple restarts and selects the best result based on the lowest within-cluster sum of squared errors (SSE).

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • cluster

      public static KMeans.Result cluster(double[][] X, int K, int maxIter, long seed)
      Performs k-means clustering on a given dataset.
      Parameters:
      X - The data points to be clustered, represented as a 2D array where each row corresponds to a data point and each column corresponds to a feature. The dataset must be non-null.
      K - The number of clusters to create. If K is greater than the number of data points, it will be adjusted to the number of data points. K must be a positive integer.
      maxIter - The maximum number of iterations the algorithm will run. Must be a positive integer.
      seed - The seed for the random number generator used to initialize the cluster centroids.
      Returns:
      An instance of the Result class containing the cluster assignments (labels) for each data point and the coordinates of the centroids of the clusters.
    • clusterWithRestarts

      public static KMeans.Result clusterWithRestarts(double[][] X, int K, int maxIter, long seed, int restarts)
      Performs k-means clustering with multiple restarts and selects the best result based on the lowest within-cluster sum of squared errors (SSE).
      Parameters:
      X - The data points to be clustered, represented as a 2D array where each row corresponds to a data point and each column corresponds to a feature. The dataset must be non-null.
      K - The number of clusters to create. If K is greater than the number of data points, it will be adjusted to the number of data points. K must be a positive integer.
      maxIter - The maximum number of iterations the algorithm will run. Must be a positive integer.
      seed - The seed for the random number generator used to initialize the cluster centroids for each restart.
      restarts - The number of times to restart the clustering process with different initial centroids. If set to a value less than or equal to 0, the clustering algorithm will run only once.
      Returns:
      An instance of the Result class containing the cluster assignments (labels) for each data point and the coordinates of the centroids of the clusters from the best run (based on the lowest within-cluster SSE).