Class BoxDataSet
- All Implemented Interfaces:
DataModel,DataSet,KnowledgeTransferable,VariableSource,TetradSerializable,Serializable
ContinuousVariable or a DiscreteVariable. This
class violates object orientation in that the underlying data matrix is
retrievable using the getDoubleData() method. This is allowed so that
external calculations may be performed on large datasets without having to
allocate extra memory. If this matrix needs to be modified externally, please
consider making a copy of it first, using the TetradMatrix copy() method.
The data set may be given a name; this name is not used internally.
The data set has a list of variables associated with it, as described above. This list is coordinated with the stored data, in that data for the i'th variable will be in the i'th column.
A subset of variables in the data set may be designated as selected. This
selection set is stored with the data set and may be manipulated using the
select and deselect methods.
A multiplicity m_i may be associated with each case c_i in the dataset, which is interpreted to mean that that c_i occurs m_i times in the dataset.
Knowledge may be associated with the data set, using the
setKnowledge method. This knowledge is not used internally to
the data set, but it may be retrieved by algorithms and used.
- Author:
- Joseph Ramsey
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionBoxDataSet(BoxDataSet dataSet) Makes of copy of the given data set.BoxDataSet(DataBox dataBox, List<Node> variables) -
Method Summary
Modifier and TypeMethodDescriptionvoidaddVariable(int index, Node variable) Adds the given variable to the dataset, increasing the number of columns by one, moving columns i >=indexto column i + 1, and inserting a column of missing values at column i.voidaddVariable(Node variable) Adds the given variable to the data set, increasing the number of columns by one, moving columns i >=indexto column i + 1, and inserting a column of missing values at column i.voidchangeVariable(Node from, Node to) Changes the variable for the given column fromfromtoto.voidMarks all variables as deselected.copy()voidensureColumns(int columns, List<String> excludedVariableNames) Ensures that the dataset has at least the given number of columns, adding continuous variables with unique names until that is true.voidensureRows(int rows) Ensures that the dataset has at least the number of rows, adding rows if necessary to make that the case.booleanbooleanReturns true if and only if this data set contains at least one missing value.intIf this is a continuous data set, returns the correlation matrix.If this is a continuous data set, returns the covariance matrix.doublegetDouble(int row, int column) intgetInt(int row, int column) getName()Gets the name of the data set.The number format of the dataset.intintgetObject(int row, int col) int[]getVariable(int col) getVariable(String varName) booleanbooleanbooleanisMixed()booleanisSelected(Node variable) like()voidRandomly permutes the rows of the dataset.voidremoveCols(int[] cols) Removes the given columns from the data set.voidremoveColumn(int index) Removes the column for the variable at the given index, reducing the number of columns by one.voidremoveColumn(Node variable) Removes the columns for the given variable from the dataset, reducing the number of columns by one.voidremoveRows(int[] selectedRows) Removes the given rows from the data set.static BoxDataSetGenerates a simple exemplar of this class to test serialization.voidsetDouble(int row, int column, double value) Sets the value at the given (row, column) to the given double value, assuming the variable for the column is continuous.voidsetInt(int row, int column, int value) Sets the value at the given (row, column) to the given int value, assuming the variable for the column is discrete.voidsetKnowledge(Knowledge knowledge) Sets knowledge to be associated with this data set.voidSets the name of the data set.voidThe number formatter used to print out continuous values.voidSets the value at the given (row, column) to the given value.voidsetOutputDelimiter(Character character) Sets the character ('\t', ' ', ',', for instance) that is used to delimit tokens when the data set is printed out using the toString() method.voidsetSelected(Node variable, boolean selected) Marks the given column as selected if 'selected' is true or deselected if 'selected' is false.subsetColumns(int[] indices) subsetColumns(List<Node> vars) Creates and returns a dataset consisting of those variables in the list vars.subsetRows(int[] rows) subsetRowsColumns(int[] rows, int[] columns) toString()Renders the data model as as String.
-
Constructor Details
-
BoxDataSet
-
BoxDataSet
Makes of copy of the given data set.
-
-
Method Details
-
getColumnToTooltip
- Specified by:
getColumnToTooltipin interfaceDataSet
-
serializableInstance
Generates a simple exemplar of this class to test serialization. -
getName
Gets the name of the data set. -
setName
Sets the name of the data set. -
getNumColumns
public int getNumColumns()- Specified by:
getNumColumnsin interfaceDataSet- Returns:
- the number of variables in the data set.
-
getNumRows
public int getNumRows()- Specified by:
getNumRowsin interfaceDataSet- Returns:
- the number of rows in the rectangular data set, which is the maximum of the number of rows in the list of wrapped columns.
-
setInt
public void setInt(int row, int column, int value) Sets the value at the given (row, column) to the given int value, assuming the variable for the column is discrete. -
setDouble
public void setDouble(int row, int column, double value) Sets the value at the given (row, column) to the given double value, assuming the variable for the column is continuous. -
getObject
- Specified by:
getObjectin interfaceDataSet- Parameters:
row- The index of the case.col- The index of the variable.- Returns:
- the value at the given row and column as an Object. The type returned is deliberately vague, allowing for variables of any type. Primitives will be returned as corresponding wrapping objects (for example, doubles as Doubles).
-
setObject
Description copied from interface:DataSetSets the value at the given (row, column) to the given value. -
getSelectedIndices
public int[] getSelectedIndices()- Specified by:
getSelectedIndicesin interfaceDataSet- Returns:
- the indices of the currently selected variables.
-
addVariable
Adds the given variable to the data set, increasing the number of columns by one, moving columns i >=indexto column i + 1, and inserting a column of missing values at column i.- Specified by:
addVariablein interfaceDataSet- Throws:
IllegalArgumentException- if the variable already exists in the dataset.
-
addVariable
Adds the given variable to the dataset, increasing the number of columns by one, moving columns i >=indexto column i + 1, and inserting a column of missing values at column i.- Specified by:
addVariablein interfaceDataSet
-
getVariable
- Specified by:
getVariablein interfaceDataSet- Returns:
- the variable at the given column.
-
getColumn
-
changeVariable
Changes the variable for the given column fromfromtoto. Supported currently only for discrete variables.- Specified by:
changeVariablein interfaceDataSet- Throws:
IllegalArgumentException- if the given change is not supported.
-
getVariable
- Specified by:
getVariablein interfaceDataModel- Specified by:
getVariablein interfaceDataSet- Returns:
- the variable with the given name.
-
getVariables
- Specified by:
getVariablesin interfaceDataSet- Specified by:
getVariablesin interfaceVariableSource- Returns:
- (a copy of) the List of Variables for the data set, in the order of their columns.
-
getKnowledge
- Specified by:
getKnowledgein interfaceKnowledgeTransferable- Returns:
- a copy of the knowledge associated with this data set. (Cannot be null.)
-
setKnowledge
Sets knowledge to be associated with this data set. May not be null.- Specified by:
setKnowledgein interfaceKnowledgeTransferable
-
getVariableNames
- Specified by:
getVariableNamesin interfaceDataSet- Specified by:
getVariableNamesin interfaceVariableSource- Returns:
- (a copy of) the List of Variables for the data set, in the order of their columns.
-
setSelected
Marks the given column as selected if 'selected' is true or deselected if 'selected' is false.- Specified by:
setSelectedin interfaceDataSet
-
clearSelection
public void clearSelection()Marks all variables as deselected.- Specified by:
clearSelectionin interfaceDataSet
-
ensureRows
public void ensureRows(int rows) Ensures that the dataset has at least the number of rows, adding rows if necessary to make that the case. The new rows will be filled with missing values.- Specified by:
ensureRowsin interfaceDataSet
-
ensureColumns
Ensures that the dataset has at least the given number of columns, adding continuous variables with unique names until that is true. The new columns will be filled with missing values.- Specified by:
ensureColumnsin interfaceDataSet
-
existsMissingValue
public boolean existsMissingValue()Description copied from interface:DataSetReturns true if and only if this data set contains at least one missing value.- Specified by:
existsMissingValuein interfaceDataSet
-
isSelected
- Specified by:
isSelectedin interfaceDataSet- Returns:
- true iff the given column has been marked as selected.
-
removeColumn
public void removeColumn(int index) Removes the column for the variable at the given index, reducing the number of columns by one.- Specified by:
removeColumnin interfaceDataSet
-
removeColumn
Removes the columns for the given variable from the dataset, reducing the number of columns by one.- Specified by:
removeColumnin interfaceDataSet
-
subsetColumns
Creates and returns a dataset consisting of those variables in the list vars. Vars must be a subset of the variables of this DataSet. The ordering of the elements of vars will be the same as in the list of variables in this DataSet.- Specified by:
subsetColumnsin interfaceDataSet
-
isContinuous
public boolean isContinuous()- Specified by:
isContinuousin interfaceDataModel- Specified by:
isContinuousin interfaceDataSet- Returns:
- true iff this is a continuous data set--that is, if every column in it is continuous. (By implication, empty datasets are both discrete and continuous.)
-
isDiscrete
public boolean isDiscrete()- Specified by:
isDiscretein interfaceDataModel- Specified by:
isDiscretein interfaceDataSet- Returns:
- true iff this is a discrete data set--that is, if every column in it is discrete. (By implication, empty datasets are both discrete and continuous.)
-
isMixed
public boolean isMixed() -
getCorrelationMatrix
Description copied from interface:DataSetIf this is a continuous data set, returns the correlation matrix.- Specified by:
getCorrelationMatrixin interfaceDataSet- Returns:
- the correlation matrix for this dataset. Defers to
Statistic.covariance()in the COLT matrix library, so it inherits the handling of missing values from that library--that is, any off-diagonal correlation involving a column with a missing value is Double.NaN, although all of the on-diagonal elements are 1.0. If that's not the desired behavior, missing values can be removed or imputed first.
-
getCovarianceMatrix
Description copied from interface:DataSetIf this is a continuous data set, returns the covariance matrix.- Specified by:
getCovarianceMatrixin interfaceDataSet- Returns:
- the covariance matrix for this dataset. Defers to
Statistic.covariance()in the COLT matrix library, so it inherits the handling of missing values from that library--that is, any covariance involving a column with a missing value is Double.NaN. If that's not the desired behavior, missing values can be removed or imputed first.
-
getInt
public int getInt(int row, int column) -
getDouble
public double getDouble(int row, int column) -
toString
Description copied from interface:DataModelRenders the data model as as String.- Specified by:
toStringin interfaceDataModel- Specified by:
toStringin interfaceDataSet- Overrides:
toStringin classObject- Returns:
- a string, suitable for printing, of the dataset. Lines are
separated by '\n', tokens in the line by whatever character is set in the
setOutputDelimiter()method. The list of variables is printed first, followed by one line for each case. This method should probably not be used for saving to files. If that's your goal, use the DataSavers class instead. - See Also:
-
getDoubleData
- Specified by:
getDoubleDatain interfaceDataSet- Returns:
- a copy of the underlying COLT TetradMatrix matrix, containing all
of the data in this dataset, discrete data included. Discrete data will
be represented by ints cast to doubles. Rows in this matrix are cases,
and columns are variables. The list of variable, in the order in which
they occur in the matrix, is given by getVariables().
If isMultipliersCollapsed() returns false, multipliers in the dataset are first expanded before returning the matrix, so the number of rows in the returned matrix may not be the same as the number of rows in this dataset.
- Throws:
IllegalStateException- if this is not a continuous data set.- See Also:
-
subsetColumns
- Specified by:
subsetColumnsin interfaceDataSet- Returns:
- a new data set in which the the column at indices[i] is placed at index i, for i = 0 to indices.length - 1. (Moved over from Purify.)
-
subsetRows
- Specified by:
subsetRowsin interfaceDataSet- Returns:
- a new data set in which the the row at indices[i] is placed at index i, for i = 0 to indices.length - 1. (View instead?)
-
subsetRowsColumns
- Specified by:
subsetRowsColumnsin interfaceDataSet
-
removeCols
public void removeCols(int[] cols) Removes the given columns from the data set.- Specified by:
removeColsin interfaceDataSet
-
removeRows
public void removeRows(int[] selectedRows) Removes the given rows from the data set.- Specified by:
removeRowsin interfaceDataSet
-
equals
-
copy
-
like
-
setNumberFormat
Description copied from interface:DataSetThe number formatter used to print out continuous values.- Specified by:
setNumberFormatin interfaceDataSet
-
setOutputDelimiter
Sets the character ('\t', ' ', ',', for instance) that is used to delimit tokens when the data set is printed out using the toString() method.- Specified by:
setOutputDelimiterin interfaceDataSet- See Also:
-
permuteRows
public void permuteRows()Randomly permutes the rows of the dataset.- Specified by:
permuteRowsin interfaceDataSet
-
getNumberFormat
Description copied from interface:DataSetThe number format of the dataset.- Specified by:
getNumberFormatin interfaceDataSet- Returns:
- the number format, which by default is the one at
NumberFormatUtil.getInstance().getNumberFormat(), but can be set by the user if desired. - See Also:
-
getDataBox
-