Class Boss

java.lang.Object
edu.cmu.tetrad.search.Boss
All Implemented Interfaces:
SuborderSearch

public class Boss extends Object implements SuborderSearch
Implements Best Order Score Search (BOSS). The reference is this:

Andrews, B., Ramsey, J., Sanchez Romero, R., Camchong, J., & Kummerfeld, E. (2024). Fast Scalable and Accurate Discovery of DAGs Using the Best Order Score Search and Grow Shrink Trees. Advances in Neural Information Processing Systems, 36.

The BOSS algorithm is based on the idea that implied DAGs for permutations are most optimal in their BIC scores when the variables in the permutations are ordered so that that causes in the models come before effects for some DAG in the true Markov equivalence class.

This algorithm is implemented as a "plugin-in" algorithm to a PermutationSearch object (see), which deals with certain details of knowledge handling that are common to different permutation searches.

BOSS, like GRaSP (see), is characterized by high adjacency and orientation precision (especially) and recall for moderate sample sizes.

The algorithm works as follows:

  1. Start with an arbitrary ordering.
  2. Run the permutation search to find a better ordering.
  3. Project this ordering to a CPDAG.
  4. Optionally, Run BES this CPDAG.
  5. Return this CPDAG.

The optional BES step is needed for correctness, though with large models this has very little effect on the output, since nearly all edges are already oriented, so a parameter is included to turn that step off.

Knowledge can be used with this search. If tiered knowledge is used, then the procedure is carried out for each tier separately, given the variables preceding that tier, which allows the Boss algorithm to address tiered (e.g., time series) problems with larger numbers of variables. However, knowledge of required and forbidden edges is correctly implemented for arbitrary such knowledge.

A parameter is included to restart the search a certain number of times. The idea is that the goal is to optimize a BIC score, so if several runs are done of the algorithm for the same data, the model with the highest BIC score should be returned and the others ignored.

Version:
$Id: $Id
Author:
bryanandrews, josephramsey
See Also: