Package edu.cmu.tetrad.search.unmix
Class KMeans
java.lang.Object
edu.cmu.tetrad.search.unmix.KMeans
Implements the K-Means clustering algorithm using the k-means++ initialization method and iterative refinement. The
 algorithm partitions a dataset into a specified number of clusters by minimizing the sum of squared distances between
 points and their respective cluster centroids.
- 
Nested Class SummaryNested ClassesModifier and TypeClassDescriptionstatic classRepresents the result of a clustering operation using the KMeans algorithm.
- 
Method SummaryModifier and TypeMethodDescriptionstatic KMeans.Resultcluster(double[][] X, int K, int maxIter, long seed) Performs k-means clustering on a given dataset.static KMeans.ResultclusterWithRestarts(double[][] X, int K, int maxIter, long seed, int restarts) Performs k-means clustering with multiple restarts and selects the best result based on the lowest within-cluster sum of squared errors (SSE).
- 
Method Details- 
clusterPerforms k-means clustering on a given dataset.- Parameters:
- X- The data points to be clustered, represented as a 2D array where each row corresponds to a data point and each column corresponds to a feature. The dataset must be non-null.
- K- The number of clusters to create. If K is greater than the number of data points, it will be adjusted to the number of data points. K must be a positive integer.
- maxIter- The maximum number of iterations the algorithm will run. Must be a positive integer.
- seed- The seed for the random number generator used to initialize the cluster centroids.
- Returns:
- An instance of the Resultclass containing the cluster assignments (labels) for each data point and the coordinates of the centroids of the clusters.
 
- 
clusterWithRestartspublic static KMeans.Result clusterWithRestarts(double[][] X, int K, int maxIter, long seed, int restarts) Performs k-means clustering with multiple restarts and selects the best result based on the lowest within-cluster sum of squared errors (SSE).- Parameters:
- X- The data points to be clustered, represented as a 2D array where each row corresponds to a data point and each column corresponds to a feature. The dataset must be non-null.
- K- The number of clusters to create. If K is greater than the number of data points, it will be adjusted to the number of data points. K must be a positive integer.
- maxIter- The maximum number of iterations the algorithm will run. Must be a positive integer.
- seed- The seed for the random number generator used to initialize the cluster centroids for each restart.
- restarts- The number of times to restart the clustering process with different initial centroids. If set to a value less than or equal to 0, the clustering algorithm will run only once.
- Returns:
- An instance of the Resultclass containing the cluster assignments (labels) for each data point and the coordinates of the centroids of the clusters from the best run (based on the lowest within-cluster SSE).
 
 
-