Class KMeans

java.lang.Object
edu.cmu.tetrad.cluster.KMeans
All Implemented Interfaces:
ClusteringAlgorithm

public class KMeans extends Object implements ClusteringAlgorithm
Implements the "batch" version of the K Means clustering algorithm-- that is, in one sweep, assign each point to its nearest center, and then in a second sweep, reset each center to the mean of the cluster for that center, repeating until convergence.

Note that this algorithm is guaranteed to converge, since the total squared error is guaranteed to be reduced at each step.

Version:
$Id: $Id
Author:
josephramsey
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    Clusters the given data set.
    getCluster(int k)
    getCluster.
    Getter for the field clusters.
    int
    Return the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.
    int
    getNumClusters.
    boolean
    isVerbose.
    int
    iterations.
    static KMeans
    randomClusters(int numCenters)
    Constructs a new KMeansBatch, initializing the algorithm by randomly assigning each point in the data to one of the numCenters clusters, then calculating the centroid of each cluster.
    static KMeans
    randomPoints(int numCenters)
    Constructs a new KMeansBatch, initializing the algorithm by picking numCeneters centers randomly from the data itself.
    void
    setMaxIterations(int maxIterations)
    Sets the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.
    void
    setVerbose(boolean verbose)
    True iff verbose output should be printed.
    toString.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Method Details

    • randomPoints

      public static KMeans randomPoints(int numCenters)
      Constructs a new KMeansBatch, initializing the algorithm by picking numCeneters centers randomly from the data itself.
      Parameters:
      numCenters - The number of centers (clusters).
      Returns:
      The parametrized algorithm.
    • randomClusters

      public static KMeans randomClusters(int numCenters)
      Constructs a new KMeansBatch, initializing the algorithm by randomly assigning each point in the data to one of the numCenters clusters, then calculating the centroid of each cluster.
      Parameters:
      numCenters - The number of centers (clusters).
      Returns:
      The constructed algorithm.
    • cluster

      public void cluster(Matrix data)
      Clusters the given data set.

      Runs the batch K-means clustering algorithm on the data, returning a result.

      Specified by:
      cluster in interface ClusteringAlgorithm
      Parameters:
      data - An n x m double matrix with n cases (rows) and m variables (columns). Makes an int array c such that c[i] is the cluster that case i is placed into, or -1 if case i is not placed into a cluster (as a result of its being eliminated from consideration, for instance).
    • getClusters

      public List<List<Integer>> getClusters()

      Getter for the field clusters.

      Specified by:
      getClusters in interface ClusteringAlgorithm
      Returns:
      a List object
    • getMaxIterations

      public int getMaxIterations()
      Return the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.
      Returns:
      This value.
    • setMaxIterations

      public void setMaxIterations(int maxIterations)
      Sets the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.
      Parameters:
      maxIterations - This value.
    • getNumClusters

      public int getNumClusters()

      getNumClusters.

      Returns:
      a int
    • getCluster

      public List<Integer> getCluster(int k)

      getCluster.

      Parameters:
      k - a int
      Returns:
      a List object
    • iterations

      public int iterations()

      iterations.

      Returns:
      the number of iterations.
    • toString

      public String toString()

      toString.

      Overrides:
      toString in class Object
      Returns:
      a string representation of the cluster result.
    • isVerbose

      public boolean isVerbose()

      isVerbose.

      Returns:
      a boolean
    • setVerbose

      public void setVerbose(boolean verbose)
      True iff verbose output should be printed.
      Specified by:
      setVerbose in interface ClusteringAlgorithm
      Parameters:
      verbose - True iff verbose output should be printed.