Package edu.cmu.tetrad.data
Interface DataSet
- All Superinterfaces:
DataModel
,KnowledgeTransferable
,Serializable
,TetradSerializable
,VariableSource
- All Known Implementing Classes:
BoxDataSet
,NumberObjectDataSet
Implements a rectangular data set, in the sense of being a dataset with a fixed number of columns and a fixed number
of rows, the length of each column being constant.
- Version:
- $Id: $Id
- Author:
- josephramsey
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final long
ConstantserialVersionUID=23L
-
Method Summary
Modifier and TypeMethodDescriptionvoid
addVariable
(int index, Node variable) Adds the given variable at the given index.void
addVariable
(Node variable) Adds the given variable to the data set.void
changeVariable
(Node from, Node to) Changes the variable for the given column fromfrom
toto
.void
Marks all variables as deselected.copy()
Returns a copy of this dataset.void
ensureColumns
(int columns, List<String> excludedVariableNames) Ensures that the dataset has at leastcolumns
columns.void
ensureRows
(int rows) Ensures that the dataset has at leastrows
rows.boolean
Checks if the given object is equal to this dataset.boolean
existsMissingValue.int
getColumn.Returns the map of column names to tooltips.If this is a continuous data set, returns the correlation matrix.If this is a continuous data set, returns the covariance matrix.double
getDouble
(int row, int column) getDouble.getDoubleData.int
getInt
(int row, int column) getInt.getName()
getName.The number format of the dataset.int
getNumColumns.int
getNumRows.getObject
(int row, int col) getObject.int[]
getSelectedIndices.getVariable
(int column) getVariable.getVariable
(String name) getVariable.getVariableNames.getVariables.boolean
isContinuous.boolean
isDiscrete.boolean
isMixed()
isMixed.boolean
isSelected
(Node variable) isSelected.like()
Returns a dataset with the same dimensions as this dataset, but with no data.void
Randomizes the rows of the data set.void
removeCols
(int[] selectedCols) Removes the given columns from the data set.void
removeColumn
(int index) Removes the variable (and data) at the given index.void
removeColumn
(Node variable) Removes the given variable, along with all of its data.void
removeRows
(int[] selectedRows) Removes the given rows from the data set.void
setDouble
(int row, int column, double value) Sets the value at the given (row, column) to the given double value, assuming the variable for the column is continuous.void
setInt
(int row, int col, int value) Sets the value at the given (row, column) to the given int value, assuming the variable for the column is discrete.void
The number formatter used to print out continuous values.void
Sets the value at the given (row, column) to the given value.void
setOutputDelimiter
(Character character) The character used a delimiter when the dataset is outputvoid
setSelected
(Node variable, boolean selected) Marks the given column as selected if 'selected' is true or deselected if 'selected' is false.subsetColumns
(int[] columns) subsetColumns.subsetColumns
(List<Node> vars) Creates and returns a dataset consisting of those variables in the list vars.subsetRows
(int[] rows) subsetRows.subsetRowsColumns
(int[] rows, int[] columns) Generates a subset of the current DataSet by selecting specified rows and columns.toString()
toString.Methods inherited from interface edu.cmu.tetrad.data.KnowledgeTransferable
getKnowledge, setKnowledge
-
Field Details
-
serialVersionUID
static final long serialVersionUIDConstantserialVersionUID=23L
- See Also:
-
-
Method Details
-
addVariable
Adds the given variable to the data set.- Parameters:
variable
- The variable to add.- Throws:
IllegalArgumentException
- if the variable is neither continuous nor discrete.
-
addVariable
Adds the given variable at the given index.- Parameters:
index
- The index at which to add the variable.variable
- The variable to add.
-
changeVariable
Changes the variable for the given column fromfrom
toto
. Supported currently only for discrete variables.- Parameters:
from
- The variable to change.to
- The variable to change to.- Throws:
IllegalArgumentException
- if the given change is not supported.
-
clearSelection
void clearSelection()Marks all variables as deselected. -
ensureColumns
Ensures that the dataset has at leastcolumns
columns. Used for pasting data into the dataset. When creating new columns, names in theexcludedVariableNames
list may not be used. The purpose of this is to allow these names to be set later by the calling class, without incurring conflicts.- Parameters:
columns
- The number of columns to ensure.excludedVariableNames
- The names of variables that should not be used for new columns.
-
existsMissingValue
boolean existsMissingValue()existsMissingValue.
- Returns:
- true if and only if this data set contains at least one missing value.
-
ensureRows
void ensureRows(int rows) Ensures that the dataset has at leastrows
rows. Used for pasting data into the dataset.- Parameters:
rows
- The number of rows to ensure.
-
getColumn
getColumn.
- Parameters:
variable
- The variable to check.- Returns:
- the column index of the given variable.
-
getCorrelationMatrix
Matrix getCorrelationMatrix()If this is a continuous data set, returns the correlation matrix.- Returns:
- the correlation matrix.
- Throws:
IllegalStateException
- if this is not a continuous data set.
-
getCovarianceMatrix
Matrix getCovarianceMatrix()If this is a continuous data set, returns the covariance matrix.- Returns:
- the covariance matrix.
- Throws:
IllegalStateException
- if this is not a continuous data set.
-
getDouble
double getDouble(int row, int column) getDouble.
- Parameters:
row
- The index of the case.column
- The index of the variable.- Returns:
- the value at the given row and column as a double. For discrete data, returns the integer value cast to a double.
-
getDoubleData
Matrix getDoubleData()getDoubleData.
- Returns:
- the underlying data matrix as a TetradMatrix.
- Throws:
IllegalStateException
- if this is not a continuous data set.
-
getInt
int getInt(int row, int column) getInt.
- Parameters:
row
- The index of the case.column
- The index of the variable.- Returns:
- the value at the given row and column as an int, rounding if necessary. For discrete variables, this returns the category index of the datum for the variable at that column. Returns DiscreteVariable.MISSING_VALUE for missing values.
-
getName
-
getNumColumns
int getNumColumns()getNumColumns.
- Returns:
- the number of columns in the data set.
-
getNumRows
int getNumRows()getNumRows.
- Returns:
- the number of rows in the data set.
-
getObject
getObject.
- Parameters:
row
- The index of the case.col
- The index of the variable.- Returns:
- the value at the given row and column as an Object. The type returned is deliberately vague, allowing for variables of any type. Primitives will be returned as corresponding wrapping objects (for example, doubles as Doubles).
-
getSelectedIndices
int[] getSelectedIndices()getSelectedIndices.
- Returns:
- the currently selected variables.
-
getVariable
getVariable.
- Parameters:
column
- The index of the variable.- Returns:
- the variable at the given column.
-
getVariable
getVariable.
- Specified by:
getVariable
in interfaceDataModel
- Parameters:
name
- aString
object- Returns:
- the variable with the given name, or null if no such variable exists.
-
getVariableNames
getVariableNames.
- Specified by:
getVariableNames
in interfaceVariableSource
- Returns:
- (a copy of) the List of Variables for the data set, in the order of their columns.
-
getVariables
getVariables.
- Specified by:
getVariables
in interfaceVariableSource
- Returns:
- (a copy of) the List of Variables for the data set, in the order of their columns.
-
isContinuous
boolean isContinuous()isContinuous.
- Specified by:
isContinuous
in interfaceDataModel
- Returns:
- true if this is a continuous data set--that is, if it contains at least one column and all the columns are continuous.
-
isDiscrete
boolean isDiscrete()isDiscrete.
- Specified by:
isDiscrete
in interfaceDataModel
- Returns:
- true if this is a discrete data set--that is, if it contains at least one column and all the columns are discrete.
-
isMixed
-
isSelected
isSelected.
- Parameters:
variable
- The variable to check.- Returns:
- true iff the given column has been marked as selected.
-
removeColumn
void removeColumn(int index) Removes the variable (and data) at the given index.- Parameters:
index
- The index of the variable to remove.
-
removeColumn
Removes the given variable, along with all of its data.- Parameters:
variable
- The variable to remove.
-
removeCols
void removeCols(int[] selectedCols) Removes the given columns from the data set.- Parameters:
selectedCols
- The indices of the columns to remove.
-
removeRows
void removeRows(int[] selectedRows) Removes the given rows from the data set.- Parameters:
selectedRows
- The indices of the rows to remove.
-
setDouble
void setDouble(int row, int column, double value) Sets the value at the given (row, column) to the given double value, assuming the variable for the column is continuous.- Parameters:
row
- The index of the case.column
- The index of the variable.value
- The value to set.
-
setInt
void setInt(int row, int col, int value) Sets the value at the given (row, column) to the given int value, assuming the variable for the column is discrete.- Parameters:
row
- The index of the case.col
- The index of the variable.value
- The value to set.
-
setObject
Sets the value at the given (row, column) to the given value.- Parameters:
row
- The index of the case.col
- The index of the variable.value
- The value to set.
-
setSelected
Marks the given column as selected if 'selected' is true or deselected if 'selected' is false.- Parameters:
variable
- The variable to select or deselect.selected
- True to select the variable, false to deselect it.
-
subsetRowsColumns
Generates a subset of the current DataSet by selecting specified rows and columns.- Parameters:
rows
- an array of row indices to include in the subsetcolumns
- an array of column indices to include in the subset- Returns:
- a new DataSet object containing only the specified rows and columns
-
subsetColumns
Creates and returns a dataset consisting of those variables in the list vars. Vars must be a subset of the variables of this DataSet. The ordering of the elements of vars will be the same as in the list of variables in this DataSet.- Parameters:
vars
- The variables to include in the new data set.- Returns:
- a new data set consisting of the variables in the list vars.
-
subsetColumns
subsetColumns.
- Parameters:
columns
- The indices of the columns to include in the new data set.- Returns:
- a new data set in which the column at indices[i] is placed at index i, for i = 0 to indices.length - 1. (View instead?)
-
subsetRows
subsetRows.
- Parameters:
rows
- The indices of the rows to include in the new data set.- Returns:
- a new data set in which the row at indices[i] is placed at index i, for i = 0 to indices.length - 1. (View instead?)
-
toString
-
getNumberFormat
NumberFormat getNumberFormat()The number format of the dataset.- Returns:
- The number format of the dataset.
-
setNumberFormat
The number formatter used to print out continuous values.- Parameters:
nf
- The number formatter used to print out continuous values.
-
setOutputDelimiter
The character used a delimiter when the dataset is output- Parameters:
character
- The character used as a delimiter when the dataset is output
-
permuteRows
void permuteRows()Randomizes the rows of the data set. -
getColumnToTooltip
-
equals
-
copy
-
like
DataSet like()Returns a dataset with the same dimensions as this dataset, but with no data.- Returns:
- a dataset with the same dimensions as this dataset, but with no data.
-