Class SemBicScore

java.lang.Object
edu.cmu.tetrad.search.score.SemBicScore
All Implemented Interfaces:
Score

public class SemBicScore extends Object implements Score
Implements the linear, Gaussian BIC score, with a 'penalty discount' multiplier on the BIC penalty. The formula used for the score is BIC = 2L - ck ln n, where c is the penalty discount and L is the linear, Gaussian log likelihood--that is, the sum of the log likelihoods of the individual records, which are assumed to be i.i.d.

For FGES, Chickering uses the standard linear, Gaussian BIC score, so we will for lack of a better reference give his paper:

Chickering (2002) "Optimal structure identification with greedy search" Journal of Machine Learning Research.

The version of the score due to Nandy et al. is given in this reference:

Nandy, P., Hauser, A., & Maathuis, M. H. (2018). High-dimensional consistency in score-based and hybrid structure learning. The Annals of Statistics, 46(6A), 3151-3183.

This score may be used anywhere though where a linear, Gaussian score is needed. Anecdotally, the score is fairly robust to non-Gaussianity, though with some additional unfaithfulness over and above what the score would give for Gaussian data, a detriment that can be overcome to an extent by using a permutation algorithm such as SP, GRaSP, or BOSS.

As for all scores in Tetrad, higher scores mean more dependence, and negative scores indicate independence.

Version:
$Id: $Id
Author:
josephramsey
See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static final record 
    Represents a covariance matrix and regression coefficients.
    static final record 
    A record that encapsulates the result of a likelihood computation.
    static enum 
    Gives two options for calculating the BIC score, one describe by Chickering and the other due to Nandy et al.
  • Constructor Summary

    Constructors
    Constructor
    Description
    SemBicScore(DataSet dataSet, boolean precomputeCovariances)
    Constructs the score using a covariance matrix.
    SemBicScore(DataSet dataSet, double penaltyDiscount, boolean precomputeCovariances)
    Constructs the score using a covariance matrix.
    Constructs the score using a covariance matrix.
    SemBicScore(ICovarianceMatrix covariances, double penaltyDiscount)
    Constructs the score using a covariance matrix.
  • Method Summary

    Modifier and Type
    Method
    Description
    boolean
    Returns true is the variables in z determine the variable y.
    double
    getAic(int i, int... parents)
    Computes the Akaike Information Criterion (AIC) score for the given variable and its parent variables in a probabilistic graphical model such as a Bayesian network.
    static Matrix
    getCov(List<Integer> rows, int[] cols, int[] all, DataSet dataSet, Matrix cov)
    Computes the covariance matrix for the given subset of rows and columns in the provided data set.
    getCovAndCoefs(int i, int[] parents, Matrix data, ICovarianceMatrix covariances, boolean calculateRowSubsets, double lambda)
    Returns the covariance matrix of the regression of the ith variable on its parents and the regression coefficients.
    static @NotNull SemBicScore.CovAndCoefs
    getCovAndCoefs(int i, int[] parents, Matrix data, ICovarianceMatrix covariances, double lambda, List<Integer> rows)
    Returns the covariance matrix of the regression of the ith variable on its parents and the regression
    Returns the covariance matrix.
    Returns the data model.
    Returns the data model.
    double
    getLikelihood(int i, int[] parents)
    Calculates the likelihood for the given variable and its parent variables based on the provided data and covariance matrices.
    getLikelihoodAndDof(int i, int... parents)
    Computes the likelihood and degrees of freedom (dof) for a given variable and its parent variables.
    int
    Returns the maximum degree of the score.
    double
    Returns the multiplier on the penalty term for this score.
    static double
    getResidualVariance(int i, int[] parents, Matrix data, ICovarianceMatrix covariances, boolean calculateRowSubsets, double lambda)
    Returns the variance of the residual of the regression of the ith variable on its parents.
    int
    Returns the sample size.
    double
    Returns the structure prior for this score.
    Returns the variables of the covariance matrix.
    boolean
    isEffectEdge(double bump)
    Returns true iff the edge between x and y is an effect edge.
    boolean
    Returns true if verbose output should be sent to out.
    double
    localScore(int i, int... parents)
    Returns the score for the given node and its parents.
    double
    localScoreDiff(int x, int y, int[] z)
    Returns the score difference of the graph.
    double
    nandyBic(int x, int y, int[] z)
    Calculates the BIC score of a partial correlation based on the specified variables.
    void
    setLambda(double lambda)
    Returns the covariance matrix of the regression of the ith variable on its parents and the regression coefficients.
    void
    setPenaltyDiscount(double penaltyDiscount)
    Sets the multiplier on the penalty term for this score.
    void
    Sets the rule type to use.
    void
    setStructurePrior(double structurePrior)
    Sets the structure prior for this score.
    void
    setVariables(List<Node> variables)
    Sets the variables of the covariance matrix.
    void
    setVerbose(boolean verbose)
    Sets whether verbose output should be sent to out.
    subset(List<Node> subset)
    Returns a SEM BIC score for the given subset of variables.
    Returns a string representation of this score.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

    Methods inherited from interface edu.cmu.tetrad.search.score.Score

    append, getVariable, localScore, localScore, localScoreDiff
  • Constructor Details

    • SemBicScore

      public SemBicScore(ICovarianceMatrix covariances)
      Constructs the score using a covariance matrix.
      Parameters:
      covariances - The covariance matrix.
    • SemBicScore

      public SemBicScore(ICovarianceMatrix covariances, double penaltyDiscount)
      Constructs the score using a covariance matrix.
      Parameters:
      covariances - The covariance matrix.
      penaltyDiscount - The penalty discount of the score.
    • SemBicScore

      public SemBicScore(DataSet dataSet, boolean precomputeCovariances)
      Constructs the score using a covariance matrix.
      Parameters:
      dataSet - The dataset.
      precomputeCovariances - Whether the covariances should be precomputed or computed on the fly. True if
    • SemBicScore

      public SemBicScore(DataSet dataSet, double penaltyDiscount, boolean precomputeCovariances)
      Constructs the score using a covariance matrix.
      Parameters:
      dataSet - The dataset.
      penaltyDiscount - The penalty discount of th e score.
      precomputeCovariances - Whether the covariances should be precomputed or computed on the fly. True if
  • Method Details

    • getResidualVariance

      public static double getResidualVariance(int i, int[] parents, Matrix data, ICovarianceMatrix covariances, boolean calculateRowSubsets, double lambda) throws org.apache.commons.math3.linear.SingularMatrixException
      Returns the variance of the residual of the regression of the ith variable on its parents.
      Parameters:
      i - The index of the variable.
      parents - The indices of the parents.
      data - a Matrix object
      covariances - The covariance matrix.
      calculateRowSubsets - True if row subsets should be calculated.
      lambda - Singularity lambda.
      Returns:
      The variance of the residual of the regression of the ith variable on its parents.
      Throws:
      org.apache.commons.math3.linear.SingularMatrixException - if any.
    • getCovAndCoefs

      @NotNull public static SemBicScore.CovAndCoefs getCovAndCoefs(int i, int[] parents, Matrix data, ICovarianceMatrix covariances, boolean calculateRowSubsets, double lambda)
      Returns the covariance matrix of the regression of the ith variable on its parents and the regression coefficients.
      Parameters:
      i - The index of the variable.
      parents - The indices of the parents.
      data - The data matrix.
      covariances - The covariance matrix.
      calculateRowSubsets - True if row subsets should be calculated.
      lambda - Singularity lambda.
      Returns:
      The covariance matrix of the regression of the ith variable on its parents and the regression coefficients.
    • getCovAndCoefs

      @NotNull public static @NotNull SemBicScore.CovAndCoefs getCovAndCoefs(int i, int[] parents, Matrix data, ICovarianceMatrix covariances, double lambda, List<Integer> rows)
      Returns the covariance matrix of the regression of the ith variable on its parents and the regression
      Parameters:
      i - The index of the variable.
      parents - The indices of the parents.
      data - The data matrix.
      covariances - The covariance matrix.
      lambda - Singularity lambda.
      rows - The rows to use.
      Returns:
      The covariance matrix of the regression of the ith variable on its parents and the regression
    • getCov

      public static Matrix getCov(List<Integer> rows, int[] cols, int[] all, DataSet dataSet, Matrix cov)
      Computes the covariance matrix for the given subset of rows and columns in the provided data set.
      Parameters:
      rows - A list of the row indices to consider for computing the covariance.
      cols - An array of the column indices for which to compute the covariance matrix.
      all - An array of all column indices to check for NaN values.
      dataSet - The dataset containing the values to be used in computation. If null, the method returns a selection from the provided covariance matrix.
      cov - If dataSet is null, this covariance matrix is used to return the selected covariances.
      Returns:
      A Matrix representing the covariance computed from the given rows and columns of the dataset or a selection from the provided covariance matrix.
      Throws:
      IllegalArgumentException - If both dataSet and cov are null.
    • setLambda

      public void setLambda(double lambda)
      Returns the covariance matrix of the regression of the ith variable on its parents and the regression coefficients.
      Parameters:
      lambda - Singularity lambda.
    • localScoreDiff

      public double localScoreDiff(int x, int y, int[] z)
      Returns the score difference of the graph.
      Specified by:
      localScoreDiff in interface Score
      Parameters:
      x - A node.
      y - TAhe node.
      z - A set of nodes.
      Returns:
      The score difference.
    • nandyBic

      public double nandyBic(int x, int y, int[] z)
      Calculates the BIC score of a partial correlation based on the specified variables.
      Parameters:
      x - the index of the first variable.
      y - the index of the second variable.
      z - an array of indices representing conditioning variables.
      Returns:
      the BIC score as a double.
    • localScore

      public double localScore(int i, int... parents)
      Returns the score for the given node and its parents.
      Specified by:
      localScore in interface Score
      Parameters:
      i - The index of the node.
      parents - The indices of the node's parents.
      Returns:
      The score, or NaN if the score cannot be calculated.
    • getLikelihoodAndDof

      public SemBicScore.LikelihoodResult getLikelihoodAndDof(int i, int... parents)
      Computes the likelihood and degrees of freedom (dof) for a given variable and its parent variables. The likelihood is calculated based on the provided variable index and parent indices. In case of a singular matrix during likelihood computation, it returns a result with NaN likelihood and -1 for degrees of freedom.
      Parameters:
      i - The index of the variable for which the likelihood is calculated.
      parents - The indices of the parent variables of the variable `i`.
      Returns:
      A LikelihoodResult object containing the likelihood value, the degrees of freedom (dof), and other related penalty and sample size information.
    • getAic

      public double getAic(int i, int... parents)
      Computes the Akaike Information Criterion (AIC) score for the given variable and its parent variables in a probabilistic graphical model such as a Bayesian network.
      Parameters:
      i - The index of the variable for which the AIC score is being computed.
      parents - The indices of the parent variables of the variable specified by index i.
      Returns:
      The computed AIC score as a double value. Returns Double.NaN if a singular matrix is encountered or the score is undefined. Throws an exception if the rule type is unsupported.
    • getLikelihood

      public double getLikelihood(int i, int[] parents) throws org.apache.commons.math3.linear.SingularMatrixException
      Calculates the likelihood for the given variable and its parent variables based on the provided data and covariance matrices. This method computes the variance for the residuals and uses it to determine the likelihood score.
      Parameters:
      i - The index of the variable for which the likelihood is being calculated.
      parents - An array of indices representing the parent variables of the variable at index i.
      Returns:
      The negative log-likelihood score for the specified variable and its parent variables.
      Throws:
      org.apache.commons.math3.linear.SingularMatrixException - if the covariance matrix is singular and cannot be inverted.
    • getPenaltyDiscount

      public double getPenaltyDiscount()
      Returns the multiplier on the penalty term for this score.
      Returns:
      The multiplier on the penalty term for this score.
    • setPenaltyDiscount

      public void setPenaltyDiscount(double penaltyDiscount)
      Sets the multiplier on the penalty term for this score.
      Parameters:
      penaltyDiscount - The multiplier on the penalty term for this score.
    • getStructurePrior

      public double getStructurePrior()
      Returns the structure prior for this score.
      Returns:
      The structure prior for this score.
    • setStructurePrior

      public void setStructurePrior(double structurePrior)
      Sets the structure prior for this score.
      Parameters:
      structurePrior - The structure prior for this score.
    • getCovariances

      public ICovarianceMatrix getCovariances()
      Returns the covariance matrix.
      Returns:
      The covariance matrix.
    • getSampleSize

      public int getSampleSize()
      Returns the sample size.
      Specified by:
      getSampleSize in interface Score
      Returns:
      The sample size.
    • isEffectEdge

      public boolean isEffectEdge(double bump)
      Returns true iff the edge between x and y is an effect edge.

      Returns true if the given bump is an effect edge.

      Specified by:
      isEffectEdge in interface Score
      Parameters:
      bump - The bump.
      Returns:
      True iff the edge between x and y is an effect edge.
    • getDataModel

      public DataModel getDataModel()
      Returns the data model.
      Returns:
      The data model.
    • isVerbose

      public boolean isVerbose()
      Returns true if verbose output should be sent to out.
      Returns:
      True, if verbose output should be sent to out.
    • setVerbose

      public void setVerbose(boolean verbose)
      Sets whether verbose output should be sent to out.
      Parameters:
      verbose - True, if verbose output should be sent to out.
    • getVariables

      public List<Node> getVariables()
      Returns the variables of the covariance matrix.
      Specified by:
      getVariables in interface Score
      Returns:
      This list.
    • setVariables

      public void setVariables(List<Node> variables)
      Sets the variables of the covariance matrix.
      Parameters:
      variables - The variables of the covariance matrix.
    • getMaxDegree

      public int getMaxDegree()
      Returns the maximum degree of the score.
      Specified by:
      getMaxDegree in interface Score
      Returns:
      The max degree.
    • determines

      public boolean determines(List<Node> z, Node y)
      Returns true is the variables in z determine the variable y.
      Specified by:
      determines in interface Score
      Parameters:
      z - The set of nodes.
      y - The node.
      Returns:
      True iff the score determines the edge between x and y.
    • getData

      public DataModel getData()
      Returns the data model.
      Returns:
      The data model.
    • setRuleType

      public void setRuleType(SemBicScore.RuleType ruleType)
      Sets the rule type to use.
      Parameters:
      ruleType - The rule type to use.
      See Also:
    • subset

      public SemBicScore subset(List<Node> subset)
      Returns a SEM BIC score for the given subset of variables.
      Parameters:
      subset - The subset of variables.
      Returns:
      A SEM BIC score for the given subset of variables.
    • toString

      public String toString()
      Returns a string representation of this score.
      Specified by:
      toString in interface Score
      Overrides:
      toString in class Object
      Returns:
      A string representation of this score.