Class NumberObjectDataSet
- All Implemented Interfaces:
- DataModel,- DataSet,- KnowledgeTransferable,- VariableSource,- TetradSerializable,- Serializable
ContinuousVariable or a DiscreteVariable.
 This class violates object orientation in that the underlying data matrix is
 retrievable using the getDoubleData() method. This is allowed so that
 external calculations may be performed on large datasets without having to
 allocate extra memory. If this matrix needs to be modified externally, please
 consider making a copy of it first, using the TetradMatrix copy() method.
 The data set may be given a name; this name is not used internally.
The data set has a list of variables associated with it, as described above. This list is coordinated with the stored data, in that data for the i'th variable will be in the i'th column.
 A subset of variables in the data set may be designated as selected. This
 selection set is stored with the data set and may be manipulated using the
 select and deselect methods.
 
// * A multiplicity m_i may be associated with each case c_i in the dataset, which // * is interpreted to mean that that c_i occurs m_i times in the dataset. // *
 Knowledge may be associated with the data set, using the
 setKnowledge method. This knowledge is not used internally to
 the data set, but it may be retrieved by algorithm and used.
 
This data set replaces an earlier Minitab-style DataSet class. The reasons for replacement are as follows.
- COLT marices are optimized for double 2D matrix calculations in ways that Java-style double[][] matrices are not.
- The COLT library comes with a wide range of linear algebra library methods that are better tested and more flexible than that linear algebra methods used previously in Tetrad.
- Views of COLT matrices can often be used in places where copies of data sets were being created.
- The only place where data sets were being manipulated for honest reasons was in the interface; everywhere else, it turns out to have been sensible to calculate a list of variables and a sample size in advance and allocate memory for a data set with these dimensions. For very large data sets, it makes more sense to disallow memory-hogging manipulations than to throw out-of-memory errors.
- Author:
- Joseph Ramsey
- See Also:
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionvoidaddVariable(int index, Node variable) Adds the given variable to the dataset, increasing the number of columns by one, moving columns i >=indexto column i + 1, and inserting a column of missing values at column i.voidaddVariable(Node variable) Adds the given variable to the data set, increasing the number of columns by one, moving columns i >=indexto column i + 1, and inserting a column of missing values at column i.voidchangeVariable(Node from, Node to) Changes the variable for the given column fromfromtoto.voidMarks all variables as deselected.copy()voidensureColumns(int columns, List<String> excludedVariableNames) Ensures that the dataset has at least the given number of columns, adding continuous variables with unique names until that is true.voidensureRows(int rows) Ensures that the dataset has at least the number of rows, adding rows if necessary to make that the case.booleanbooleanReturns true if and only if this data set contains at least one missing value.intIf this is a continuous data set, returns the correlation matrix.If this is a continuous data set, returns the covariance matrix.doublegetDouble(int row, int column) intgetInt(int row, int column) getName()Gets the name of the data set.The number format of the dataset.intintgetObject(int row, int col) int[]getVariable(int col) getVariable(String varName) booleanbooleanbooleanisMixed()booleanisSelected(Node variable) like()voidRandomly permutes the rows of the dataset.voidremoveCols(int[] cols) Removes the given columns from the data set.voidremoveColumn(int index) Removes the column for the variable at the given index, reducing the number of columns by one.voidremoveColumn(Node variable) Removes the columns for the given variable from the dataset, reducing the number of columns by one.voidremoveRows(int[] selectedRows) Removes the given rows from the data set.static NumberObjectDataSetGenerates a simple exemplar of this class to test serialization.voidsetDouble(int row, int column, double value) Sets the value at the given (row, column) to the given double value, assuming the variable for the column is continuous.voidsetInt(int row, int column, int value) Sets the value at the given (row, column) to the given int value, assuming the variable for the column is discrete.voidsetKnowledge(Knowledge knowledge) Sets knowledge to be associated with this data set.voidSets the name of the data set.voidThe number formatter used to print out continuous values.voidSets the value at the given (row, column) to the given value.voidsetOutputDelimiter(Character character) Sets the character ('\t', ' ', ',', for instance) that is used to delimit tokens when the data set is printed out using the toString() method.voidsetSelected(Node variable, boolean selected) Marks the given column as selected if 'selected' is true or deselected if 'selected' is false.subsetColumns(int[] indices) subsetColumns(List<Node> vars) Creates and returns a dataset consisting of those variables in the list vars.subsetRows(int[] rows) subsetRowsColumns(int[] rows, int[] columns) toString()Renders the data model as as String.
- 
Constructor Details- 
NumberObjectDataSet
 
- 
- 
Method Details- 
getColumnToTooltip- Specified by:
- getColumnToTooltipin interface- DataSet
 
- 
serializableInstanceGenerates a simple exemplar of this class to test serialization.
- 
getNameGets the name of the data set.
- 
setNameSets the name of the data set.
- 
getNumColumnspublic int getNumColumns()- Specified by:
- getNumColumnsin interface- DataSet
- Returns:
- the number of variables in the data set.
 
- 
getNumRowspublic int getNumRows()- Specified by:
- getNumRowsin interface- DataSet
- Returns:
- the number of rows in the rectangular data set, which is the maximum of the number of rows in the list of wrapped columns.
 
- 
setIntpublic void setInt(int row, int column, int value) Sets the value at the given (row, column) to the given int value, assuming the variable for the column is discrete.
- 
setDoublepublic void setDouble(int row, int column, double value) Sets the value at the given (row, column) to the given double value, assuming the variable for the column is continuous.
- 
getObject- Specified by:
- getObjectin interface- DataSet
- Parameters:
- row- The index of the case.
- col- The index of the variable.
- Returns:
- the value at the given row and column as an Object. The type returned is deliberately vague, allowing for variables of any type. Primitives will be returned as corresponding wrapping objects (for example, doubles as Doubles).
 
- 
setObjectDescription copied from interface:DataSetSets the value at the given (row, column) to the given value.
- 
getSelectedIndicespublic int[] getSelectedIndices()- Specified by:
- getSelectedIndicesin interface- DataSet
- Returns:
- the indices of the currently selected variables.
 
- 
getSelectedVariables- Returns:
- the set of currently selected variables.
 
- 
addVariableAdds the given variable to the data set, increasing the number of columns by one, moving columns i >=indexto column i + 1, and inserting a column of missing values at column i.- Specified by:
- addVariablein interface- DataSet
- Throws:
- IllegalArgumentException- if the variable already exists in the dataset.
 
- 
addVariableAdds the given variable to the dataset, increasing the number of columns by one, moving columns i >=indexto column i + 1, and inserting a column of missing values at column i.- Specified by:
- addVariablein interface- DataSet
 
- 
getVariable- Specified by:
- getVariablein interface- DataSet
- Returns:
- the variable at the given column.
 
- 
getColumn
- 
changeVariableChanges the variable for the given column fromfromtoto. Supported currently only for discrete variables.- Specified by:
- changeVariablein interface- DataSet
- Throws:
- IllegalArgumentException- if the given change is not supported.
 
- 
getVariable- Specified by:
- getVariablein interface- DataModel
- Specified by:
- getVariablein interface- DataSet
- Returns:
- the variable with the given name.
 
- 
getVariables- Specified by:
- getVariablesin interface- DataSet
- Specified by:
- getVariablesin interface- VariableSource
- Returns:
- (a copy of) the List of Variables for the data set, in the order of their columns.
 
- 
getKnowledge- Specified by:
- getKnowledgein interface- KnowledgeTransferable
- Returns:
- a copy of the knowledge associated with this data set. (Cannot be null.)
 
- 
setKnowledgeSets knowledge to be associated with this data set. May not be null.- Specified by:
- setKnowledgein interface- KnowledgeTransferable
 
- 
getVariableNames- Specified by:
- getVariableNamesin interface- DataSet
- Specified by:
- getVariableNamesin interface- VariableSource
- Returns:
- (a copy of) the List of Variables for the data set, in the order of their columns.
 
- 
setSelectedMarks the given column as selected if 'selected' is true or deselected if 'selected' is false.- Specified by:
- setSelectedin interface- DataSet
 
- 
clearSelectionpublic void clearSelection()Marks all variables as deselected.- Specified by:
- clearSelectionin interface- DataSet
 
- 
ensureRowspublic void ensureRows(int rows) Ensures that the dataset has at least the number of rows, adding rows if necessary to make that the case. The new rows will be filled with missing values.- Specified by:
- ensureRowsin interface- DataSet
 
- 
ensureColumnsEnsures that the dataset has at least the given number of columns, adding continuous variables with unique names until that is true. The new columns will be filled with missing values.- Specified by:
- ensureColumnsin interface- DataSet
 
- 
existsMissingValuepublic boolean existsMissingValue()Description copied from interface:DataSetReturns true if and only if this data set contains at least one missing value.- Specified by:
- existsMissingValuein interface- DataSet
 
- 
isSelected- Specified by:
- isSelectedin interface- DataSet
- Returns:
- true iff the given column has been marked as selected.
 
- 
removeColumnpublic void removeColumn(int index) Removes the column for the variable at the given index, reducing the number of columns by one.- Specified by:
- removeColumnin interface- DataSet
 
- 
removeColumnRemoves the columns for the given variable from the dataset, reducing the number of columns by one.- Specified by:
- removeColumnin interface- DataSet
 
- 
subsetColumnsCreates and returns a dataset consisting of those variables in the list vars. Vars must be a subset of the variables of this DataSet. The ordering of the elements of vars will be the same as in the list of variables in this DataSet.- Specified by:
- subsetColumnsin interface- DataSet
 
- 
isContinuouspublic boolean isContinuous()- Specified by:
- isContinuousin interface- DataModel
- Specified by:
- isContinuousin interface- DataSet
- Returns:
- true iff this is a continuous data set--that is, if every column in it is continuous. (By implication, empty datasets are both discrete and continuous.)
 
- 
isDiscretepublic boolean isDiscrete()- Specified by:
- isDiscretein interface- DataModel
- Specified by:
- isDiscretein interface- DataSet
- Returns:
- true iff this is a discrete data set--that is, if every column in it is discrete. (By implication, empty datasets are both discrete and continuous.)
 
- 
isMixedpublic boolean isMixed()
- 
getCorrelationMatrixDescription copied from interface:DataSetIf this is a continuous data set, returns the correlation matrix.- Specified by:
- getCorrelationMatrixin interface- DataSet
- Returns:
- the correlation matrix for this dataset. Defers to
 Statistic.covariance()in the COLT matrix library, so it inherits the handling of missing values from that library--that is, any off-diagonal correlation involving a column with a missing value is Double.NaN, although all of the on-diagonal elements are 1.0. If that's not the desired behavior, missing values can be removed or imputed first.
 
- 
getCovarianceMatrixDescription copied from interface:DataSetIf this is a continuous data set, returns the covariance matrix.- Specified by:
- getCovarianceMatrixin interface- DataSet
- Returns:
- the covariance matrix for this dataset. Defers to
 Statistic.covariance()in the COLT matrix library, so it inherits the handling of missing values from that library--that is, any covariance involving a column with a missing value is Double.NaN. If that's not the desired behavior, missing values can be removed or imputed first.
 
- 
getIntpublic int getInt(int row, int column) 
- 
getDoublepublic double getDouble(int row, int column) 
- 
toStringDescription copied from interface:DataModelRenders the data model as as String.- Specified by:
- toStringin interface- DataModel
- Specified by:
- toStringin interface- DataSet
- Overrides:
- toStringin class- Object
- Returns:
- a string, suitable for printing, of the dataset. Lines are
 separated by '\n', tokens in the line by whatever character is set in the
 setOutputDelimiter()method. The list of variables is printed first, followed by one line for each case. This method should probably not be used for saving to files. If that's your goal, use the DataSavers class instead.
- See Also:
 
- 
getDoubleData- Specified by:
- getDoubleDatain interface- DataSet
- Returns:
- a copy of the underlying COLT TetradMatrix matrix, containing
 all of the data in this dataset, discrete data included. Discrete data
 will be represented by ints cast to doubles. Rows in this matrix are
 cases, and columns are variables. The list of variable, in the order in
 which they occur in the matrix, is given by getVariable().
 //     * // * If isMultipliersCollapsed() returns false, multipliers in the dataset are // * first expanded before returning the matrix, so the number of rows in the // * returned matrix may not be the same as the number of rows in this // * dataset. 
- Throws:
- IllegalStateException- if this is not a continuous data set.
- See Also:
 
- 
subsetColumns- Specified by:
- subsetColumnsin interface- DataSet
- Returns:
- a new data set in which the the column at indices[i] is placed at index i, for i = 0 to indices.length - 1. (Moved over from Purify.)
 
- 
subsetRows- Specified by:
- subsetRowsin interface- DataSet
- Returns:
- a new data set in which the the row at indices[i] is placed at index i, for i = 0 to indices.length - 1. (View instead?)
 
- 
subsetRowsColumns- Specified by:
- subsetRowsColumnsin interface- DataSet
 
- 
removeColspublic void removeCols(int[] cols) Removes the given columns from the data set.- Specified by:
- removeColsin interface- DataSet
 
- 
removeRowspublic void removeRows(int[] selectedRows) Removes the given rows from the data set.- Specified by:
- removeRowsin interface- DataSet
 
- 
equals- Specified by:
- equalsin interface- DataSet
- Overrides:
- equalsin class- Object
- Returns:
- true iff objis a continuous RectangularDataSet with corresponding variables of the same name and corresponding data values equal, when rendered using the number format atNumberFormatUtil.getInstance().getNumberFormat().
 
- 
copy
- 
like
- 
setNumberFormatDescription copied from interface:DataSetThe number formatter used to print out continuous values.- Specified by:
- setNumberFormatin interface- DataSet
 
- 
setOutputDelimiterSets the character ('\t', ' ', ',', for instance) that is used to delimit tokens when the data set is printed out using the toString() method.- Specified by:
- setOutputDelimiterin interface- DataSet
- See Also:
 
- 
permuteRowspublic void permuteRows()Randomly permutes the rows of the dataset.- Specified by:
- permuteRowsin interface- DataSet
 
- 
getNumberFormatDescription copied from interface:DataSetThe number format of the dataset.- Specified by:
- getNumberFormatin interface- DataSet
- Returns:
- the number format, which by default is the one at
 NumberFormatUtil.getInstance().getNumberFormat(), but can be set by the user if desired.
- See Also:
 
 
-