Package edu.cmu.tetrad.data
Class Discretizer
java.lang.Object
edu.cmu.tetrad.data.Discretizer
Discretizes individual columns of discrete or continuous data. Continuous data is discretized by specifying a list of
 n - 1 cutoffs for n values in the discretized data, with optional string labels for these values. Discrete data is
 discretized by specifying a mapping from old value names to new value names, the idea being that old values may be
 merged.
- Author:
- josephramsey, Tyler Gibson
- 
Nested Class SummaryNested Classes
- 
Constructor SummaryConstructorsConstructorDescriptionDiscretizer(DataSet dataSet) Constructs a new discretizer that discretizes every variable as binary, using evenly distributed values.Discretizer(DataSet dataSet, Map<Node, DiscretizationSpec> specs) 
- 
Method SummaryModifier and TypeMethodDescriptionstatic Discretizer.Discretizationdiscretize(double[] _data, double[] cutoffs, String variableName, List<String> categories) Discretizes the continuous data in the given column using the specified cutoffs and category names.voidequalCounts(Node node, int numCategories) Sets the given node to discretized using evenly distributed values using the given number of categories.voidequalIntervals(Node node, int numCategories) Sets the given node to discretized using evenly spaced intervals using the given number of categories.static double[]getEqualFrequencyBreakPoints(double[] _data, int numberOfCategories) voidsetVariablesCopied(boolean unselectedVariabledCopied) 
- 
Constructor Details- 
DiscretizerConstructs a new discretizer that discretizes every variable as binary, using evenly distributed values.
- 
Discretizer
 
- 
- 
Method Details- 
getEqualFrequencyBreakPointspublic static double[] getEqualFrequencyBreakPoints(double[] _data, int numberOfCategories) 
- 
discretizepublic static Discretizer.Discretization discretize(double[] _data, double[] cutoffs, String variableName, List<String> categories) Discretizes the continuous data in the given column using the specified cutoffs and category names. The following scheme is used. If cutoffs[i - 1] < v <= cutoffs[i] (where cutoffs[-1] = negative infinity), then v is mapped to category i. If category names are supplied, the discrete column returned will use these category names.- Parameters:
- cutoffs- The cutoffs used to discretize the data. Should have length c - 1, where c is the number of categories in the discretized data.
- variableName- the name of the returned variable.
- categories- An optional list of category names; may be null. If this is supplied, the discrete column returned will use these category names. If this is non-null, it must have length c, where c is the number of categories for the discretized data. If any category names are null, default category names will be used for those.
- Returns:
- The discretized column.
 
- 
equalCountsSets the given node to discretized using evenly distributed values using the given number of categories.
- 
equalIntervalsSets the given node to discretized using evenly spaced intervals using the given number of categories.
- 
setVariablesCopiedpublic void setVariablesCopied(boolean unselectedVariabledCopied) 
- 
discretize- Returns:
- - Discretized dataset.
 
 
-