Class RoadmapTest

java.lang.Object
edu.cmu.tetrad.search.unmix.RoadmapTest

public class RoadmapTest extends Object
RoadmapTest: EM baseline vs Residual-clustering across scenarios.

Phase 1 Controlled synthetic sweep: (A) params-only mixtures; (B) small topology flips; (C) larger structural shifts. Noise: Gaussian vs non-Gaussian (Laplace). K: known, and unknown via EM-BIC. Metrics: ARI; adjacency/arrowhead F1; SHD (simple implementation); runtime.

Phase 2 Stress & robustness: Class imbalance, smaller n, larger p, weak separation, mis-specified K; seed repeats.

Phase 3 Semi-synthetic realism: Real covariance backbone + injected regime shifts; optional interventions knob.

Notes: - This is a practical harness: prints compact tables you can paste in email/docs. - Uses Boss+PermutationSearch for graphs; swap to your preferred search if needed. - Keep DIAGONAL covariance for EM unless d is small.

  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructor for the RoadmapTest class.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    This method performs a controlled evaluation process for the "Phase 1" experiments involving statistical clustering and graph learning methods.
    void
    Evaluates the robustness of clustering algorithms under different simulated conditions by sweeping over multiple parameters and configurations.
    void
    Phase 3 of a study using semi-synthetic data to evaluate causal structure learning methods.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • RoadmapTest

      public RoadmapTest()
      Constructor for the RoadmapTest class. This is the default no-argument constructor.
  • Method Details

    • phase1_controlledSweep

      public void phase1_controlledSweep()
      This method performs a controlled evaluation process for the "Phase 1" experiments involving statistical clustering and graph learning methods. Various scenarios are tested, participating in a sweep across different configurations and noise profiles.

      The method executes the following steps:

      • Constructs different scenarios including parameter-based configurations, small topological flips, and large topological flips.
      • Configures and runs the EmUnmix algorithm with fixed and adaptive cluster settings.
      • Computes external per-cluster graphs using various algorithms such as BOSS and PC-Max.
      • Calculates metrics like Adjusted Rand Index (ARI) for clustering and graph-level metrics including adjacency F1, arrow accuracy, and Structural Hamming Distance (SHD).
      • Performs a raw-data baseline analysis using the Gaussian Mixture EM algorithm for comparison.

      Results from the experiments, including metrics and timings, are printed to the standard output.

      This method is used as part of the RoadmapTest class for assessing the performance of clustering and graph-learning algorithms under different simulated conditions.

    • phase2_robustness

      public void phase2_robustness()
      Evaluates the robustness of clustering algorithms under different simulated conditions by sweeping over multiple parameters and configurations. The method simulates scenarios with varying dataset sizes, imbalances, and signal scales, then analyzes clustering performance.

      The robustness evaluation is carried out as follows:

      • Defines a set of total data sizes (nTotals) and calculates sample sizes for two groups based on specified imbalances (imbalances).
      • Sweeps through different signal scales (signalScales) representing the level of separation between clusters.
      • Repeats the experiment multiple times (repeats) with different random seeds.
      • For each configuration:
        1. Constructs a synthetic scenario using the Scenario.smallTopoFlipParamScaled() method with specified parameters, such as flips, signal scale, and noise type.
        2. Configures the EmUnmix clustering algorithm with specific settings such as cluster count (K), parent superset usage, scoring type, and covariance type.
        3. Runs the clustering algorithm and evaluates its performance using Adjusted Rand Index (ARI).
        4. Collects the median ARI and interquartile range (IQR) for each configuration.
      • Outputs the median ARI and IQR for each combination of data size, imbalance, and signal scale.

      This method is useful for understanding the stability and robustness of clustering algorithms across varying conditions and provides insights into their sensitivity to dataset properties, random seed initialization, and signal strength.

    • phase3_semisynthetic

      public void phase3_semisynthetic()
      Phase 3 of a study using semi-synthetic data to evaluate causal structure learning methods. This test simulates data under two distinct regimes (A and B) based on directed acyclic graph (DAG) structures and evaluates the reconstruction accuracy of the clusters learned using mixture model algorithms and causal graph discovery techniques.

      The process is as follows:

      1. Generate a backbone DAG (gBackbone) with specified nodes and edges.
      2. Simulate data (Dreal) from the backbone DAG using Laplace-distributed errors.
      3. Create two modified regimes:
        • Regime A: retains the original backbone structure.
        • Regime B: modifies the DAG by flipping directions of a subset of edges and scaling coefficients as well as error variances.
      4. Simulate datasets (dA and dB) for each regime and concatenate them into a unified dataset with associated labels.
      5. Shuffle the combined dataset while retaining the labels.
      6. Use an Expectation-Maximization-based Gaussian Mixture Model (EM-GMM) to learn cluster assignments.
      7. Perform causal graph learning separately for each discovered cluster using either a SEM-BIC score with permutation-based search or an alternative PC-Max approach.
      8. Evaluate the reconstructed DAGs for each cluster against the ground-truth DAGs using various metrics such as Adjusted Rand Index (ARI), adjusted F1, arrow F1, and Structural Hamming Distance (SHD).

      This test is designed to assess the ability of a mixture model approach combined with graph discovery methods to recover cluster-specific causal structures under the semi-synthetic data setup.