Package edu.cmu.tetrad.cluster
Class KMeans
java.lang.Object
edu.cmu.tetrad.cluster.KMeans
- All Implemented Interfaces:
- ClusteringAlgorithm
Implements the "batch" version of the K Means clustering algorithm-- that is,
 in one sweep, assign each point to its nearest center, and then in a second
 sweep, reset each center to the mean of the cluster for that center,
 repeating until convergence.
 
Note that this algorithm is guaranteed to converge, since the total squared error is guaranteed to be reduced at each step.
- Author:
- josephramsey
- 
Method SummaryModifier and TypeMethodDescriptionvoidRuns the batch K-means clustering algorithm on the data, returning a result.getCluster(int k) intReturn the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.intbooleanintstatic KMeansrandomClusters(int numCenters) Constructs a new KMeansBatch, initializing the algorithm by randomly assigning each point in the data to one of the numCenters clusters, then calculating the centroid of each cluster.static KMeansrandomPoints(int numCenters) Constructs a new KMeansBatch, initializing the algorithm by pickingnumCeneterscenters randomly from the data itself.voidsetMaxIterations(int maxIterations) Sets the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.voidsetVerbose(boolean verbose) True iff verbose output should be printed.toString()
- 
Method Details- 
randomPointsConstructs a new KMeansBatch, initializing the algorithm by pickingnumCeneterscenters randomly from the data itself.- Parameters:
- numCenters- The number of centers (clusters).
- Returns:
- The parametrized algorithm.
 
- 
randomClustersConstructs a new KMeansBatch, initializing the algorithm by randomly assigning each point in the data to one of the numCenters clusters, then calculating the centroid of each cluster.- Parameters:
- numCenters- The number of centers (clusters).
- Returns:
- The constructed algorithm.
 
- 
clusterRuns the batch K-means clustering algorithm on the data, returning a result.- Specified by:
- clusterin interface- ClusteringAlgorithm
- Parameters:
- data- An n x m double matrix with n cases (rows) and m variables (columns). Makes an int array c such that c[i] is the cluster that case i is placed into, or -1 if case i is not placed into a cluster (as a result of its being eliminated from consideration, for instance).
 
- 
getClusters- Specified by:
- getClustersin interface- ClusteringAlgorithm
- Returns:
- a list of clusters, each consisting of a list of indices in the
 dataset provided as an argument to cluster, or null if the data has not yet been clustered.
 
- 
getMaxIterationspublic int getMaxIterations()Return the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.- Returns:
- This value.
 
- 
setMaxIterationspublic void setMaxIterations(int maxIterations) Sets the maximum number of iterations, or -1 if the algorithm is allowed to run unconstrainted.- Parameters:
- maxIterations- This value.
 
- 
getNumClusterspublic int getNumClusters()
- 
getCluster
- 
iterationspublic int iterations()- Returns:
- the number of iterations.
 
- 
toString
- 
isVerbosepublic boolean isVerbose()
- 
setVerbosepublic void setVerbose(boolean verbose) Description copied from interface:ClusteringAlgorithmTrue iff verbose output should be printed.- Specified by:
- setVerbosein interface- ClusteringAlgorithm
 
 
-