Class Tsc

java.lang.Object
edu.cmu.tetrad.search.Tsc
All Implemented Interfaces:
EffectiveSampleSizeSettable

public class Tsc extends Object implements EffectiveSampleSizeSettable
The Tsc class provides methods and utilities for statistical computations, clustering, and rank-based analysis of variables. This class manages significance levels, caching mechanisms, and structures to efficiently handle clusters and their associated ranks.

Theory (NOLAC) — soundness sketch. We assume a linear-Gaussian SEM with a latent DAG and pure measurement (each observed loads on exactly one latent), independent unique errors across distinct clusters, and generic parameters (no exact cancellations). Under the NOLAC (no overlapping clusters) assumption, the indicator sets for distinct latents are disjoint. With a consistent rank test (e.g., Wilks LRT with a diminishing α), the following properties hold generically:

  • Seed soundness. If G is a true cluster with latent-boundary dimension r (typically r=1), then every (r+1)-subset S⊂G satisfies rank(S, V\S)=r. If S contains any nonmember, generically rank(S, V\S)>r.
  • Union/extension correctness. Growing a seed by unions that preserve rank r expands exactly to the maximal G; adding a nonmember raises the rank and is rejected.
  • Non-overlap. Because each observed belongs to at most one true G, any attempt to reuse a committed variable either raises the rank earlier or is blocked by bookkeeping; accepted clusters are pairwise disjoint.
  • Conditional-rank refinement (Rule 3). For any Z⊂C with |Z|≥r, if rank(C\Z, V\(C) | Z)=0 then Z acts as an observed bottleneck in a pure DAG-without-latents scenario; removing Z collapses spurious clusters. In a true latent cluster with noisy indicators, conditioning on any small Z cannot annihilate the latent contribution, so the refinement leaves true clusters intact generically.

Practical guidance. Use α that decreases slowly with n (e.g., α=1/log n) or an information-criterion cutoff to reduce Type-I rank errors with sample size. Ensure expectedSampleSize reflects the covariance sample size.

Author:
josephramsey
  • Constructor Details

    • Tsc

      public Tsc(List<Node> variables, CovarianceMatrix cov)
      Constructs an instance of the TscScored class using the provided variables and covariance matrix.
      Parameters:
      variables - a list of Node elements representing variables to be included in the scoring process
      cov - a CovarianceMatrix object representing the covariance matrix associated with the variables
  • Method Details

    • toNamesCluster

      @NotNull public static @NotNull StringBuilder toNamesCluster(Collection<Integer> cluster, List<Node> nodes)
      Constructs a StringBuilder containing a formatted string representation of the names of nodes corresponding to the provided cluster indices.
      Parameters:
      cluster - a collection of integers representing indices of nodes to include in the cluster
      nodes - a list of Node objects where each integer index in the cluster corresponds to a node
      Returns:
      a StringBuilder containing the formatted names of the nodes in the specified cluster
    • toNamesClusters

      @NotNull public static @NotNull String toNamesClusters(Set<Set<Integer>> clusters, List<Node> nodes)
      Converts a set of clusters represented as sets of integers into a string representation that associates cluster IDs with node names.
      Parameters:
      clusters - a set of clusters, where each cluster is a set of integers representing node IDs
      nodes - a list of Node objects representing the nodes, where the index corresponds to the node ID
      Returns:
      a string containing the names of the nodes in each cluster, separated by "; " for different clusters
    • findClustersAtRank

      public Set<Set<Integer>> findClustersAtRank(List<Integer> vars, int size, int rank)
      Identifies clusters of variables at a specified rank. This method generates all possible clusters based on the given variable list and size, computes their ranks, and filters those that match the specified target rank.
      Parameters:
      vars - a list of integers representing the variables to consider
      size - the size of the clusters to generate
      rank - the target rank to filter clusters
      Returns:
      a set of clusters that match the specified rank, where each cluster is represented as a set of integers
    • findClusters

      public Map<Set<Integer>,Integer> findClusters()
      Identifies clusters of variables and associates each cluster with a rank.

      This method computes clusters by calling an internal implementation and returns the results in the form of a map. Each entry in the map represents a cluster (denoted as a set of integers, where each integer is an identifier for a variable) associated with its respective rank.

      Returns:
      a map where the keys are sets of integers representing clusters of variables, and the values are integers representing the rank associated with each cluster
    • setAlpha

      public void setAlpha(double alpha)
      Sets the significance level alpha used in statistical computations. The significance level determines the threshold for hypothesis testing and affects the resulting ranks or scores. Updating this parameter clears the cached ranks as they depend on the current alpha value.
      Parameters:
      alpha - the significance level to be set, typically a value between 0 and 1, where lower values indicate stricter thresholds.
    • setVerbose

      public void setVerbose(boolean verbose)
      Sets the verbose mode for the application or process.
      Parameters:
      verbose - a boolean value where true enables verbose mode and false disables it.
    • getEffectiveSampleSize

      public int getEffectiveSampleSize()
      Returns the effective sample size.
      Specified by:
      getEffectiveSampleSize in interface EffectiveSampleSizeSettable
      Returns:
      the effective sample size
    • setEffectiveSampleSize

      public void setEffectiveSampleSize(int nEff)
      Sets the expected sample size used in calculations. The expected sample size must be either -1, indicating it should default to the current sample size, or a positive integer greater than 0.
      Specified by:
      setEffectiveSampleSize in interface EffectiveSampleSizeSettable
      Parameters:
      nEff - the expected sample size to be set. Must be -1 or a positive integer greater than 0.
      Throws:
      IllegalArgumentException - if the provided expected sample size is not -1 and less than or equal to 0.
    • setRmax

      public void setRmax(int rMax)
      The algorithm will consider ranks from 0 up to this value, rMax.
      Parameters:
      rMax - The maximum rank to consider.
    • setMinRedundancy

      public void setMinRedundancy(int minRedundancy)
      Sets the minimum redundancy value. Clusters of size rank + 1 can be unstable, as cross-checking is not possible for them. Setting this value to a number minRedundancy greater or equal to than 0 will tell the algorithm to not include clusters of size less than rank + 1 + minRedundancy.
      Parameters:
      minRedundancy - the minimum redundancy value; if less than 0, it is automatically set to 0.