Package edu.cmu.tetrad.data
Class Discretizer
java.lang.Object
edu.cmu.tetrad.data.Discretizer
Discretizes individual columns of discrete or continuous data. Continuous data is discretized by specifying a list of
 n - 1 cutoffs for n values in the discretized data, with optional string labels for these values. Discrete data is
 discretized by specifying a mapping from old value names to new value names, the idea being that old values may be
 merged.
- Version:
- $Id: $Id
- Author:
- josephramsey, Tyler Gibson
- 
Nested Class SummaryNested ClassesModifier and TypeClassDescriptionstatic classA discretization specification for a continuous variable.
- 
Constructor SummaryConstructorsConstructorDescriptionDiscretizer(DataSet dataSet) Constructs a new discretizer that discretizes every variable as binary, using evenly distributed values.Discretizer(DataSet dataSet, Map<Node, DiscretizationSpec> specs) Constructor for Discretizer.
- 
Method SummaryModifier and TypeMethodDescriptiondiscretize.static Discretizer.Discretizationdiscretize(double[] _data, double[] cutoffs, String variableName, List<String> categories) Discretizes the continuous data in the given column using the specified cutoffs and category names.voidequalCounts(Node node, int numCategories) Sets the given node to discretized using evenly distributed values using the given number of categories.voidequalIntervals(Node node, int numCategories) Sets the given node to discretized using evenly spaced intervals using the given number of categories.static double[]getEqualFrequencyBreakPoints(double[] _data, int numberOfCategories) getEqualFrequencyBreakPoints.voidsetVariablesCopied(boolean unselectedVariabledCopied) Setter for the fieldvariablesCopied.
- 
Constructor Details
- 
Method Details- 
getEqualFrequencyBreakPointspublic static double[] getEqualFrequencyBreakPoints(double[] _data, int numberOfCategories) getEqualFrequencyBreakPoints. - Parameters:
- _data- an array of objects
- numberOfCategories- a int
- Returns:
- an array of objects
 
- 
discretizepublic static Discretizer.Discretization discretize(double[] _data, double[] cutoffs, String variableName, List<String> categories) Discretizes the continuous data in the given column using the specified cutoffs and category names. The following scheme is used. If cutoffs[i - 1] < v <= cutoffs[i] (where cutoffs[-1] = negative infinity), then v is mapped to category i. If category names are supplied, the discrete column returned will use these category names.- Parameters:
- _data- an array of objects
- cutoffs- The cutoffs used to discretize the data. Should have length c - 1, where c is the number of categories in the discretized data.
- variableName- the name of the returned variable.
- categories- An optional list of category names; may be null. If this is supplied, the discrete column returned will use these category names. If this is non-null, it must have length c, where c is the number of categories for the discretized data. If any category names are null, default category names will be used for those.
- Returns:
- The discretized column.
 
- 
equalCountsSets the given node to discretized using evenly distributed values using the given number of categories.- Parameters:
- node- a- Nodeobject
- numCategories- a int
 
- 
equalIntervalsSets the given node to discretized using evenly spaced intervals using the given number of categories.- Parameters:
- node- a- Nodeobject
- numCategories- a int
 
- 
setVariablesCopiedpublic void setVariablesCopied(boolean unselectedVariabledCopied) Setter for the field variablesCopied.- Parameters:
- unselectedVariabledCopied- a boolean
 
- 
discretizediscretize. - Returns:
- - Discretized dataset.
 
 
-