Package edu.cmu.tetrad.cluster
Class KMeans
java.lang.Object
edu.cmu.tetrad.cluster.KMeans
- All Implemented Interfaces:
ClusteringAlgorithm
Implements the "batch" version of the K Means clustering algorithm-- that is, in one sweep, assign each point to its
nearest center, and then in a second sweep, reset each center to the mean of the cluster for that center, repeating
until convergence.
Note that this algorithm is guaranteed to converge, since the total squared error is guaranteed to be reduced at each step.
- Version:
- $Id: $Id
- Author:
- josephramsey
-
Method Summary
Modifier and TypeMethodDescriptionvoid
Clusters the given data set.getCluster
(int k) getCluster.Getter for the fieldclusters
.int
Return the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.int
getNumClusters.boolean
isVerbose.int
iterations.static KMeans
randomClusters
(int numCenters) Constructs a new KMeansBatch, initializing the algorithm by randomly assigning each point in the data to one of the numCenters clusters, then calculating the centroid of each cluster.static KMeans
randomPoints
(int numCenters) Constructs a new KMeansBatch, initializing the algorithm by pickingnumCeneters
centers randomly from the data itself.void
setMaxIterations
(int maxIterations) Sets the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.void
setVerbose
(boolean verbose) True iff verbose output should be printed.toString()
toString.
-
Method Details
-
randomPoints
Constructs a new KMeansBatch, initializing the algorithm by pickingnumCeneters
centers randomly from the data itself.- Parameters:
numCenters
- The number of centers (clusters).- Returns:
- The parametrized algorithm.
-
randomClusters
Constructs a new KMeansBatch, initializing the algorithm by randomly assigning each point in the data to one of the numCenters clusters, then calculating the centroid of each cluster.- Parameters:
numCenters
- The number of centers (clusters).- Returns:
- The constructed algorithm.
-
cluster
Clusters the given data set.Runs the batch K-means clustering algorithm on the data, returning a result.
- Specified by:
cluster
in interfaceClusteringAlgorithm
- Parameters:
data
- An n x m double matrix with n cases (rows) and m variables (columns). Makes an int array c such that c[i] is the cluster that case i is placed into, or -1 if case i is not placed into a cluster (as a result of its being eliminated from consideration, for instance).
-
getClusters
Getter for the field
clusters
.- Specified by:
getClusters
in interfaceClusteringAlgorithm
- Returns:
- a
List
object
-
getMaxIterations
public int getMaxIterations()Return the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.- Returns:
- This value.
-
setMaxIterations
public void setMaxIterations(int maxIterations) Sets the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.- Parameters:
maxIterations
- This value.
-
getNumClusters
public int getNumClusters()getNumClusters.
- Returns:
- a int
-
getCluster
-
iterations
public int iterations()iterations.
- Returns:
- the number of iterations.
-
toString
-
isVerbose
public boolean isVerbose()isVerbose.
- Returns:
- a boolean
-
setVerbose
public void setVerbose(boolean verbose) True iff verbose output should be printed.- Specified by:
setVerbose
in interfaceClusteringAlgorithm
- Parameters:
verbose
- True iff verbose output should be printed.
-