Package edu.cmu.tetrad.classify
Class ClassifierMbDiscrete
java.lang.Object
edu.cmu.tetrad.classify.ClassifierMbDiscrete
- All Implemented Interfaces:
ClassifierDiscrete
Performs a Bayesian classification of a test set based on a given training set. PC-MB is used to select a Markov
blanket DAG of the target; this DAG is used to estimate a Bayes model using the training data. The Bayes model is
then updated for each case in the test data to produce classifications.
- Version:
- $Id: $Id
- Author:
- Frank Wimberly, josephramsey
-
Constructor Summary
ConstructorsConstructorDescriptionClassifierMbDiscrete
(String trainPath, String testPath, String targetString, String alphaString, String depthString, String priorString, String maxMissingString) Constructs a new ClassifierMbDiscrete object using the given training and test data, target variable, alpha value, -
Method Summary
Modifier and TypeMethodDescriptionint[]
classify()
Classifies the test data by Bayesian updating.int[][]
crossTabulation.double
Getter for the fieldpercentCorrect
.static void
Runs MbClassify using moves-line arguments.
-
Constructor Details
-
ClassifierMbDiscrete
public ClassifierMbDiscrete(String trainPath, String testPath, String targetString, String alphaString, String depthString, String priorString, String maxMissingString) Constructs a new ClassifierMbDiscrete object using the given training and test data, target variable, alpha value,- Parameters:
trainPath
- the path to the training data filetestPath
- the path to the test data filetargetString
- the name of the target variablealphaString
- the alpha value for the Dirichlet estimatordepthString
- the depth for the PC-MB searchpriorString
- the prior for the Dirichlet estimatormaxMissingString
- the maximum number of missing values for a test case
-
-
Method Details
-
main
Runs MbClassify using moves-line arguments. The syntax is:java MbClassify train.dat test.dat target alpha depth
- Parameters:
args
- train.dat test.dat alpha depth dirichlet_prior max_missing
-
classify
Classifies the test data by Bayesian updating. The procedure is as follows. First, PC-MB is run on the training data to estimate an MB CPDAG. Bidirected edges are removed; an MB DAG G is selected from the CPDAG that remains. Second, a Bayes model B is estimated using this G and the training data. Third, for each case in the test data, the marginal for the target variable in B is calculated conditioning on values of the other varialbes in B in the test data; these are reported as classifications. Estimation of B is done using a Dirichlet estimator, with a symmetric prior, with the given alpha value. Updating is done using a row-summing exact updater.One consequence of using the row-summing exact updater is that classification will be fast except for cases in which there are lots of missing values. The reason for this is that for such cases the number of rows that need to be summed over will be exponential in the number of missing values for that case. Hence the parameter for max num missing values. A good default for this is like 5. Any test case with more than that number of missing values will be skipped.
- Specified by:
classify
in interfaceClassifierDiscrete
- Returns:
- The classifications.
- Throws:
InterruptedException
-
crossTabulation
public int[][] crossTabulation()crossTabulation.
- Specified by:
crossTabulation
in interfaceClassifierDiscrete
- Returns:
- the cross-tabulation from the classify method. The classify method must be run first.
-
getPercentCorrect
public double getPercentCorrect()Getter for the field
percentCorrect
.- Specified by:
getPercentCorrect
in interfaceClassifierDiscrete
- Returns:
- the percent correct from the classify method. The classify method must be run first.
-