Class ClassifierMbDiscrete

java.lang.Object
edu.cmu.tetrad.classify.ClassifierMbDiscrete
All Implemented Interfaces:
ClassifierDiscrete

public class ClassifierMbDiscrete extends Object implements ClassifierDiscrete
Performs a Bayesian classification of a test set based on a given training set. PC-MB is used to select a Markov blanket DAG of the target; this DAG is used to estimate a Bayes model using the training data. The Bayes model is then updated for each case in the test data to produce classifications.
Version:
$Id: $Id
Author:
Frank Wimberly, josephramsey
  • Constructor Details

    • ClassifierMbDiscrete

      public ClassifierMbDiscrete(String trainPath, String testPath, String targetString, String alphaString, String depthString, String priorString, String maxMissingString)
      Constructs a new ClassifierMbDiscrete object using the given training and test data, target variable, alpha value,
      Parameters:
      trainPath - the path to the training data file
      testPath - the path to the test data file
      targetString - the name of the target variable
      alphaString - the alpha value for the Dirichlet estimator
      depthString - the depth for the PC-MB search
      priorString - the prior for the Dirichlet estimator
      maxMissingString - the maximum number of missing values for a test case
  • Method Details

    • main

      public static void main(String[] args)
      Runs MbClassify using moves-line arguments. The syntax is:
       java MbClassify train.dat test.dat target alpha depth
       
      Parameters:
      args - train.dat test.dat alpha depth dirichlet_prior max_missing
    • classify

      public int[] classify()
      Classifies the test data by Bayesian updating. The procedure is as follows. First, PC-MB is run on the training data to estimate an MB CPDAG. Bidirected edges are removed; an MB DAG G is selected from the CPDAG that remains. Second, a Bayes model B is estimated using this G and the training data. Third, for each case in the test data, the marginal for the target variable in B is calculated conditioning on values of the other varialbes in B in the test data; these are reported as classifications. Estimation of B is done using a Dirichlet estimator, with a symmetric prior, with the given alpha value. Updating is done using a row-summing exact updater.

      One consequence of using the row-summing exact updater is that classification will be fast except for cases in which there are lots of missing values. The reason for this is that for such cases the number of rows that need to be summed over will be exponential in the number of missing values for that case. Hence the parameter for max num missing values. A good default for this is like 5. Any test case with more than that number of missing values will be skipped.

      Specified by:
      classify in interface ClassifierDiscrete
      Returns:
      The classifications.
    • crossTabulation

      public int[][] crossTabulation()

      crossTabulation.

      Specified by:
      crossTabulation in interface ClassifierDiscrete
      Returns:
      the cross-tabulation from the classify method. The classify method must be run first.
    • getPercentCorrect

      public double getPercentCorrect()

      Getter for the field percentCorrect.

      Specified by:
      getPercentCorrect in interface ClassifierDiscrete
      Returns:
      the percent correct from the classify method. The classify method must be run first.