Class Tsc
- All Implemented Interfaces:
EffectiveSampleSizeSettable
Theory (NOLAC) â soundness sketch. We assume a linear-Gaussian SEM with a latent DAG and pure measurement (each observed loads on exactly one latent), independent unique errors across distinct clusters, and generic parameters (no exact cancellations). Under the NOLAC (no overlapping clusters) assumption, the indicator sets for distinct latents are disjoint. With a consistent rank test (e.g., Wilks LRT with a diminishing α), the following properties hold generically:
- Seed soundness. If G is a true cluster with latent-boundary dimension r (typically r=1), then every (r+1)-subset SâG satisfies rank(S, V\S)=r. If S contains any nonmember, generically rank(S, V\S)>r.
- Union/extension correctness. Growing a seed by unions that preserve rank r expands exactly to the maximal G; adding a nonmember raises the rank and is rejected.
- Non-overlap. Because each observed belongs to at most one true G, any attempt to reuse a committed variable either raises the rank earlier or is blocked by bookkeeping; accepted clusters are pairwise disjoint.
- Conditional-rank refinement (Rule 3). For any ZâC with |Z|â¥r, if rank(C\Z, V\(C) | Z)=0 then Z acts as an observed bottleneck in a pure DAG-without-latents scenario; removing Z collapses spurious clusters. In a true latent cluster with noisy indicators, conditioning on any small Z cannot annihilate the latent contribution, so the refinement leaves true clusters intact generically.
Practical guidance. Use α that decreases slowly with n (e.g., α=1/log n) or an information-criterion cutoff
to reduce Type-I rank errors with sample size. Ensure expectedSampleSize reflects the covariance sample size.
- Author:
- josephramsey
-
Constructor Summary
ConstructorsConstructorDescriptionTsc(List<Node> variables, CovarianceMatrix cov) Constructs an instance of the TscScored class using the provided variables and covariance matrix. -
Method Summary
Modifier and TypeMethodDescriptionIdentifies clusters of variables and associates each cluster with a rank.findClustersAtRank(List<Integer> vars, int size, int rank) Identifies clusters of variables at a specified rank.intReturns the effective sample size.voidsetAlpha(double alpha) Sets the significance level alpha used in statistical computations.voidsetEffectiveSampleSize(int nEff) Sets the expected sample size used in calculations.voidsetMinRedundancy(int minRedundancy) Sets the minimum redundancy value.voidsetRmax(int rMax) The algorithm will consider ranks from 0 up to this value, rMax.voidsetVerbose(boolean verbose) Sets the verbose mode for the application or process.static @NotNull StringBuildertoNamesCluster(Collection<Integer> cluster, List<Node> nodes) Constructs a StringBuilder containing a formatted string representation of the names of nodes corresponding to the provided cluster indices.static @NotNull StringConverts a set of clusters represented as sets of integers into a string representation that associates cluster IDs with node names.
-
Constructor Details
-
Tsc
Constructs an instance of the TscScored class using the provided variables and covariance matrix.- Parameters:
variables- a list of Node elements representing variables to be included in the scoring processcov- a CovarianceMatrix object representing the covariance matrix associated with the variables
-
-
Method Details
-
toNamesCluster
@NotNull public static @NotNull StringBuilder toNamesCluster(Collection<Integer> cluster, List<Node> nodes) Constructs a StringBuilder containing a formatted string representation of the names of nodes corresponding to the provided cluster indices.- Parameters:
cluster- a collection of integers representing indices of nodes to include in the clusternodes- a list of Node objects where each integer index in the cluster corresponds to a node- Returns:
- a StringBuilder containing the formatted names of the nodes in the specified cluster
-
toNamesClusters
@NotNull public static @NotNull String toNamesClusters(Set<Set<Integer>> clusters, List<Node> nodes) Converts a set of clusters represented as sets of integers into a string representation that associates cluster IDs with node names.- Parameters:
clusters- a set of clusters, where each cluster is a set of integers representing node IDsnodes- a list of Node objects representing the nodes, where the index corresponds to the node ID- Returns:
- a string containing the names of the nodes in each cluster, separated by "; " for different clusters
-
findClustersAtRank
Identifies clusters of variables at a specified rank. This method generates all possible clusters based on the given variable list and size, computes their ranks, and filters those that match the specified target rank.- Parameters:
vars- a list of integers representing the variables to considersize- the size of the clusters to generaterank- the target rank to filter clusters- Returns:
- a set of clusters that match the specified rank, where each cluster is represented as a set of integers
-
findClusters
Identifies clusters of variables and associates each cluster with a rank.This method computes clusters by calling an internal implementation and returns the results in the form of a map. Each entry in the map represents a cluster (denoted as a set of integers, where each integer is an identifier for a variable) associated with its respective rank.
- Returns:
- a map where the keys are sets of integers representing clusters of variables, and the values are integers representing the rank associated with each cluster
-
setAlpha
public void setAlpha(double alpha) Sets the significance level alpha used in statistical computations. The significance level determines the threshold for hypothesis testing and affects the resulting ranks or scores. Updating this parameter clears the cached ranks as they depend on the current alpha value.- Parameters:
alpha- the significance level to be set, typically a value between 0 and 1, where lower values indicate stricter thresholds.
-
setVerbose
public void setVerbose(boolean verbose) Sets the verbose mode for the application or process.- Parameters:
verbose- a boolean value wheretrueenables verbose mode andfalsedisables it.
-
getEffectiveSampleSize
public int getEffectiveSampleSize()Returns the effective sample size.- Specified by:
getEffectiveSampleSizein interfaceEffectiveSampleSizeSettable- Returns:
- the effective sample size
-
setEffectiveSampleSize
public void setEffectiveSampleSize(int nEff) Sets the expected sample size used in calculations. The expected sample size must be either -1, indicating it should default to the current sample size, or a positive integer greater than 0.- Specified by:
setEffectiveSampleSizein interfaceEffectiveSampleSizeSettable- Parameters:
nEff- the expected sample size to be set. Must be -1 or a positive integer greater than 0.- Throws:
IllegalArgumentException- if the provided expected sample size is not -1 and less than or equal to 0.
-
setRmax
public void setRmax(int rMax) The algorithm will consider ranks from 0 up to this value, rMax.- Parameters:
rMax- The maximum rank to consider.
-
setMinRedundancy
public void setMinRedundancy(int minRedundancy) Sets the minimum redundancy value. Clusters of size rank + 1 can be unstable, as cross-checking is not possible for them. Setting this value to a number minRedundancy greater or equal to than 0 will tell the algorithm to not include clusters of size less than rank + 1 + minRedundancy.- Parameters:
minRedundancy- the minimum redundancy value; if less than 0, it is automatically set to 0.
-