Package edu.cmu.tetrad.data
Class Discretizer
java.lang.Object
edu.cmu.tetrad.data.Discretizer
Discretizes individual columns of discrete or continuous data. Continuous data is discretized by specifying a list of
n - 1 cutoffs for n values in the discretized data, with optional string labels for these values. Discrete data is
discretized by specifying a mapping from old value names to new value names, the idea being that old values may be
merged.
- Version:
- $Id: $Id
- Author:
- josephramsey, Tyler Gibson
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
A discretization specification for a continuous variable. -
Constructor Summary
ConstructorsConstructorDescriptionDiscretizer
(DataSet dataSet) Constructs a new discretizer that discretizes every variable as binary, using evenly distributed values.Discretizer
(DataSet dataSet, Map<Node, DiscretizationSpec> specs) Constructor for Discretizer. -
Method Summary
Modifier and TypeMethodDescriptiondiscretize.static Discretizer.Discretization
discretize
(double[] _data, double[] cutoffs, String variableName, List<String> categories) Discretizes the continuous data in the given column using the specified cutoffs and category names.void
equalCounts
(Node node, int numCategories) Sets the given node to discretized using evenly distributed values using the given number of categories.void
equalIntervals
(Node node, int numCategories) Sets the given node to discretized using evenly spaced intervals using the given number of categories.static double[]
getEqualFrequencyBreakPoints
(double[] _data, int numberOfCategories) getEqualFrequencyBreakPoints.void
setVariablesCopied
(boolean unselectedVariabledCopied) Setter for the fieldvariablesCopied
.
-
Constructor Details
-
Discretizer
-
Discretizer
-
-
Method Details
-
getEqualFrequencyBreakPoints
public static double[] getEqualFrequencyBreakPoints(double[] _data, int numberOfCategories) getEqualFrequencyBreakPoints.
- Parameters:
_data
- an array of objectsnumberOfCategories
- a int- Returns:
- an array of objects
-
discretize
public static Discretizer.Discretization discretize(double[] _data, double[] cutoffs, String variableName, List<String> categories) Discretizes the continuous data in the given column using the specified cutoffs and category names. The following scheme is used. If cutoffs[i - 1] < v <= cutoffs[i] (where cutoffs[-1] = negative infinity), then v is mapped to category i. If category names are supplied, the discrete column returned will use these category names.- Parameters:
_data
- an array of objectscutoffs
- The cutoffs used to discretize the data. Should have length c - 1, where c is the number of categories in the discretized data.variableName
- the name of the returned variable.categories
- An optional list of category names; may be null. If this is supplied, the discrete column returned will use these category names. If this is non-null, it must have length c, where c is the number of categories for the discretized data. If any category names are null, default category names will be used for those.- Returns:
- The discretized column.
-
equalCounts
-
equalIntervals
-
setVariablesCopied
public void setVariablesCopied(boolean unselectedVariabledCopied) Setter for the field
variablesCopied
.- Parameters:
unselectedVariabledCopied
- a boolean
-
discretize
-