edu.cmu.tetrad.search.Tsc

All Implemented Interfaces:: EffectiveSampleSizeSettable

public class Tsc extends Object implements EffectiveSampleSizeSettable

The Tsc class provides methods and utilities for statistical computations, clustering, and rank-based analysis of variables. This class manages significance levels, caching mechanisms, and structures to efficiently handle clusters and their associated ranks.

Theory (NOLAC) â soundness sketch. We assume a linear-Gaussian SEM with a latent DAG and pure measurement (each observed loads on exactly one latent), independent unique errors across distinct clusters, and generic parameters (no exact cancellations). Under the NOLAC (no overlapping clusters) assumption, the indicator sets for distinct latents are disjoint. With a consistent rank test (e.g., Wilks LRT with a diminishing Î±), the following properties hold generically:

Seed soundness. If G is a true cluster with latent-boundary dimension r (typically r=1), then every (r+1)-subset SâG satisfies rank(S, V\S)=r. If S contains any nonmember, generically rank(S, V\S)>r.
Union/extension correctness. Growing a seed by unions that preserve rank r expands exactly to the maximal G; adding a nonmember raises the rank and is rejected.
Non-overlap. Because each observed belongs to at most one true G, any attempt to reuse a committed variable either raises the rank earlier or is blocked by bookkeeping; accepted clusters are pairwise disjoint.
Conditional-rank refinement (Rule 3). For any ZâC with |Z|â¥r, if rank(C\Z, V\(C) | Z)=0 then Z acts as an observed bottleneck in a pure DAG-without-latents scenario; removing Z collapses spurious clusters. In a true latent cluster with noisy indicators, conditioning on any small Z cannot annihilate the latent contribution, so the refinement leaves true clusters intact generically.

Practical guidance. Use Î± that decreases slowly with n (e.g., Î±=1/log n) or an information-criterion cutoff to reduce Type-I rank errors with sample size. Ensure expectedSampleSize reflects the covariance sample size.

Author:: josephramsey

Constructor Summary

Constructors

Constructor

Description

Tsc(List<Node> variables, CovarianceMatrix cov)

Constructs an instance of the TscScored class using the provided variables and covariance matrix.
Method Summary

Modifier and Type

Method

Description

Map<Set<Integer>,Integer>

findClusters()

Identifies clusters of variables and associates each cluster with a rank.

Set<Set<Integer>>

findClustersAtRank(List<Integer> vars, int size, int rank)

Identifies clusters of variables at a specified rank.

int

getEffectiveSampleSize()

Returns the effective sample size.

void

setAlpha(double alpha)

Sets the significance level alpha used in statistical computations.

void

setEffectiveSampleSize(int nEff)

Sets the expected sample size used in calculations.

void

setMinRedundancy(int minRedundancy)

Sets the minimum redundancy value.

void

setRmax(int rMax)

The algorithm will consider ranks from 0 up to this value, rMax.

void

setVerbose(boolean verbose)

Sets the verbose mode for the application or process.

static @NotNull StringBuilder

toNamesCluster(Collection<Integer> cluster, List<Node> nodes)

Constructs a StringBuilder containing a formatted string representation of the names of nodes corresponding to the provided cluster indices.

static @NotNull String

toNamesClusters(Set<Set<Integer>> clusters, List<Node> nodes)

Converts a set of clusters represented as sets of integers into a string representation that associates cluster IDs with node names.

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- Tsc
  
  public Tsc(List<Node> variables, CovarianceMatrix cov)
  
  Constructs an instance of the TscScored class using the provided variables and covariance matrix.
  
  Parameters:
  
  variables - a list of Node elements representing variables to be included in the scoring process
  
  cov - a CovarianceMatrix object representing the covariance matrix associated with the variables
Method Details
- toNamesCluster
  
  @NotNull public static @NotNull StringBuilder toNamesCluster(Collection<Integer> cluster, List<Node> nodes)
  
  Constructs a StringBuilder containing a formatted string representation of the names of nodes corresponding to the provided cluster indices.
  
  Parameters:
  
  cluster - a collection of integers representing indices of nodes to include in the cluster
  
  nodes - a list of Node objects where each integer index in the cluster corresponds to a node
  
  Returns:
  
  a StringBuilder containing the formatted names of the nodes in the specified cluster
- toNamesClusters
  
  @NotNull public static @NotNull String toNamesClusters(Set<Set<Integer>> clusters, List<Node> nodes)
  
  Converts a set of clusters represented as sets of integers into a string representation that associates cluster IDs with node names.
  
  Parameters:
  
  clusters - a set of clusters, where each cluster is a set of integers representing node IDs
  
  nodes - a list of Node objects representing the nodes, where the index corresponds to the node ID
  
  Returns:
  
  a string containing the names of the nodes in each cluster, separated by "; " for different clusters
- findClustersAtRank
  
  public Set<Set<Integer>> findClustersAtRank(List<Integer> vars, int size, int rank)
  
  Identifies clusters of variables at a specified rank. This method generates all possible clusters based on the given variable list and size, computes their ranks, and filters those that match the specified target rank.
  
  Parameters:
  
  vars - a list of integers representing the variables to consider
  
  size - the size of the clusters to generate
  
  rank - the target rank to filter clusters
  
  Returns:
  
  a set of clusters that match the specified rank, where each cluster is represented as a set of integers
- findClusters
  
  public Map<Set<Integer>,Integer> findClusters()
  
  Identifies clusters of variables and associates each cluster with a rank.
  This method computes clusters by calling an internal implementation and returns the results in the form of a map. Each entry in the map represents a cluster (denoted as a set of integers, where each integer is an identifier for a variable) associated with its respective rank.
  
  Returns:
  
  a map where the keys are sets of integers representing clusters of variables, and the values are integers representing the rank associated with each cluster
- setAlpha
  
  public void setAlpha(double alpha)
  
  Sets the significance level alpha used in statistical computations. The significance level determines the threshold for hypothesis testing and affects the resulting ranks or scores. Updating this parameter clears the cached ranks as they depend on the current alpha value.
  
  Parameters:
  
  alpha - the significance level to be set, typically a value between 0 and 1, where lower values indicate stricter thresholds.
- setVerbose
  
  public void setVerbose(boolean verbose)
  
  Sets the verbose mode for the application or process.
  
  Parameters:
  
  verbose - a boolean value where true enables verbose mode and false disables it.
- getEffectiveSampleSize
  
  public int getEffectiveSampleSize()
  
  Returns the effective sample size.
  
  Specified by:
  
  getEffectiveSampleSize in interface EffectiveSampleSizeSettable
  
  Returns:
  
  the effective sample size
- setEffectiveSampleSize
  
  public void setEffectiveSampleSize(int nEff)
  
  Sets the expected sample size used in calculations. The expected sample size must be either -1, indicating it should default to the current sample size, or a positive integer greater than 0.
  
  Specified by:
  
  setEffectiveSampleSize in interface EffectiveSampleSizeSettable
  
  Parameters:
  
  nEff - the expected sample size to be set. Must be -1 or a positive integer greater than 0.
  
  Throws:
  
  IllegalArgumentException - if the provided expected sample size is not -1 and less than or equal to 0.
- setRmax
  
  public void setRmax(int rMax)
  
  The algorithm will consider ranks from 0 up to this value, rMax.
  
  Parameters:
  
  rMax - The maximum rank to consider.
- setMinRedundancy
  
  public void setMinRedundancy(int minRedundancy)
  
  Sets the minimum redundancy value. Clusters of size rank + 1 can be unstable, as cross-checking is not possible for them. Setting this value to a number minRedundancy greater or equal to than 0 will tell the algorithm to not include clusters of size less than rank + 1 + minRedundancy.
  
  Parameters:
  
  minRedundancy - the minimum redundancy value; if less than 0, it is automatically set to 0.

Class Tsc

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

Tsc

Method Details

toNamesCluster

toNamesClusters

findClustersAtRank

findClusters

setAlpha

setVerbose

getEffectiveSampleSize

setEffectiveSampleSize

setRmax

setMinRedundancy