edu.cmu.tetrad.cluster.KMeans

All Implemented Interfaces:: ClusteringAlgorithm

public class KMeans extends Object implements ClusteringAlgorithm

Implements the "batch" version of the K Means clustering algorithm-- that is, in one sweep, assign each point to its nearest center, and then in a second sweep, reset each center to the mean of the cluster for that center, repeating until convergence.

Note that this algorithm is guaranteed to converge, since the total squared error is guaranteed to be reduced at each step.

Version:: $Id: $Id
Author:: josephramsey

Method Summary

Modifier and Type

Method

Description

void

cluster(Matrix data)

Clusters the given data set.

List<Integer>

getCluster(int k)

getCluster.

List<List<Integer>>

getClusters()

Getter for the field clusters.

int

getMaxIterations()

Return the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.

int

getNumClusters()

getNumClusters.

boolean

isVerbose()

isVerbose.

int

iterations()

iterations.

static KMeans

randomClusters(int numCenters)

Constructs a new KMeansBatch, initializing the algorithm by randomly assigning each point in the data to one of the numCenters clusters, then calculating the centroid of each cluster.

static KMeans

randomPoints(int numCenters)

Constructs a new KMeansBatch, initializing the algorithm by picking numCeneters centers randomly from the data itself.

void

setMaxIterations(int maxIterations)

Sets the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.

void

setVerbose(boolean verbose)

True iff verbose output should be printed.

String

toString()

toString.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Method Details
- randomPoints
  
  public static KMeans randomPoints(int numCenters)
  
  Constructs a new KMeansBatch, initializing the algorithm by picking numCeneters centers randomly from the data itself.
  
  Parameters:
  
  numCenters - The number of centers (clusters).
  
  Returns:
  
  The parametrized algorithm.
- randomClusters
  
  public static KMeans randomClusters(int numCenters)
  
  Constructs a new KMeansBatch, initializing the algorithm by randomly assigning each point in the data to one of the numCenters clusters, then calculating the centroid of each cluster.
  
  Parameters:
  
  numCenters - The number of centers (clusters).
  
  Returns:
  
  The constructed algorithm.
- cluster
  
  public void cluster(Matrix data)
  
  Clusters the given data set.
  Runs the batch K-means clustering algorithm on the data, returning a result.
  
  Specified by:
  
  cluster in interface ClusteringAlgorithm
  
  Parameters:
  
  data - An n x m double matrix with n cases (rows) and m variables (columns). Makes an int array c such that c[i] is the cluster that case i is placed into, or -1 if case i is not placed into a cluster (as a result of its being eliminated from consideration, for instance).
- getClusters
  
  public List<List<Integer>> getClusters()
  
  Getter for the field clusters.
  
  Specified by:
  
  getClusters in interface ClusteringAlgorithm
  
  Returns:
  
  a List object
- getMaxIterations
  
  public int getMaxIterations()
  
  Return the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.
  
  Returns:
  
  This value.
- setMaxIterations
  
  public void setMaxIterations(int maxIterations)
  
  Sets the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.
  
  Parameters:
  
  maxIterations - This value.
- getNumClusters
  
  public int getNumClusters()
  
  getNumClusters.
  
  Returns:
  
  a int
- getCluster
  
  public List<Integer> getCluster(int k)
  
  getCluster.
  
  Parameters:
  
  k - a int
  
  Returns:
  
  a List object
- iterations
  
  public int iterations()
  
  iterations.
  
  Returns:
  
  the number of iterations.
- toString
  
  public String toString()
  
  toString.
  
  Overrides:
  
  toString in class Object
  
  Returns:
  
  a string representation of the cluster result.
- isVerbose
  
  public boolean isVerbose()
  
  isVerbose.
  
  Returns:
  
  a boolean
- setVerbose
  
  public void setVerbose(boolean verbose)
  
  True iff verbose output should be printed.
  
  Specified by:
  
  setVerbose in interface ClusteringAlgorithm
  
  Parameters:
  
  verbose - True iff verbose output should be printed.

Class KMeans

Method Summary

Methods inherited from class java.lang.Object

Method Details

randomPoints

randomClusters

cluster

getClusters

getMaxIterations

setMaxIterations

getNumClusters

getCluster

iterations

toString

isVerbose

setVerbose