Interface DataSet

All Superinterfaces:
DataModel, KnowledgeTransferable, Serializable, TetradSerializable, VariableSource
All Known Implementing Classes:
BoxDataSet, NumberObjectDataSet

public interface DataSet extends DataModel
Implements a rectangular data set, in the sense of being a dataset with a fixed number of columns and a fixed number of rows, the length of each column being constant.
Version:
$Id: $Id
Author:
josephramsey
  • Field Details

    • serialVersionUID

      static final long serialVersionUID
      Constant serialVersionUID=23L
      See Also:
  • Method Details

    • addVariable

      void addVariable(Node variable)
      Adds the given variable to the data set.
      Parameters:
      variable - The variable to add.
      Throws:
      IllegalArgumentException - if the variable is neither continuous nor discrete.
    • addVariable

      void addVariable(int index, Node variable)
      Adds the given variable at the given index.
      Parameters:
      index - The index at which to add the variable.
      variable - The variable to add.
    • changeVariable

      void changeVariable(Node from, Node to)
      Changes the variable for the given column from from to to. Supported currently only for discrete variables.
      Parameters:
      from - The variable to change.
      to - The variable to change to.
      Throws:
      IllegalArgumentException - if the given change is not supported.
    • clearSelection

      void clearSelection()
      Marks all variables as deselected.
    • ensureColumns

      void ensureColumns(int columns, List<String> excludedVariableNames)
      Ensures that the dataset has at least columns columns. Used for pasting data into the dataset. When creating new columns, names in the excludedVariableNames list may not be used. The purpose of this is to allow these names to be set later by the calling class, without incurring conflicts.
      Parameters:
      columns - The number of columns to ensure.
      excludedVariableNames - The names of variables that should not be used for new columns.
    • existsMissingValue

      boolean existsMissingValue()

      existsMissingValue.

      Returns:
      true if and only if this data set contains at least one missing value.
    • ensureRows

      void ensureRows(int rows)
      Ensures that the dataset has at least rows rows. Used for pasting data into the dataset.
      Parameters:
      rows - The number of rows to ensure.
    • getColumn

      int getColumn(Node variable)

      getColumn.

      Parameters:
      variable - The variable to check.
      Returns:
      the column index of the given variable.
    • getCorrelationMatrix

      Matrix getCorrelationMatrix()
      If this is a continuous data set, returns the correlation matrix.
      Returns:
      the correlation matrix.
      Throws:
      IllegalStateException - if this is not a continuous data set.
    • getCovarianceMatrix

      Matrix getCovarianceMatrix()
      If this is a continuous data set, returns the covariance matrix.
      Returns:
      the covariance matrix.
      Throws:
      IllegalStateException - if this is not a continuous data set.
    • getDouble

      double getDouble(int row, int column)

      getDouble.

      Parameters:
      row - The index of the case.
      column - The index of the variable.
      Returns:
      the value at the given row and column as a double. For discrete data, returns the integer value cast to a double.
    • getDoubleData

      Matrix getDoubleData()

      getDoubleData.

      Returns:
      the underlying data matrix as a TetradMatrix.
      Throws:
      IllegalStateException - if this is not a continuous data set.
    • getInt

      int getInt(int row, int column)

      getInt.

      Parameters:
      row - The index of the case.
      column - The index of the variable.
      Returns:
      the value at the given row and column as an int, rounding if necessary. For discrete variables, this returns the category index of the datum for the variable at that column. Returns DiscreteVariable.MISSING_VALUE for missing values.
    • getName

      String getName()

      getName.

      Specified by:
      getName in interface DataModel
      Returns:
      the name of the data set.
    • getNumColumns

      int getNumColumns()

      getNumColumns.

      Returns:
      the number of columns in the data set.
    • getNumRows

      int getNumRows()

      getNumRows.

      Returns:
      the number of rows in the data set.
    • getObject

      Object getObject(int row, int col)

      getObject.

      Parameters:
      row - The index of the case.
      col - The index of the variable.
      Returns:
      the value at the given row and column as an Object. The type returned is deliberately vague, allowing for variables of any type. Primitives will be returned as corresponding wrapping objects (for example, doubles as Doubles).
    • getSelectedIndices

      int[] getSelectedIndices()

      getSelectedIndices.

      Returns:
      the currently selected variables.
    • getVariable

      Node getVariable(int column)

      getVariable.

      Parameters:
      column - The index of the variable.
      Returns:
      the variable at the given column.
    • getVariable

      Node getVariable(String name)

      getVariable.

      Specified by:
      getVariable in interface DataModel
      Parameters:
      name - a String object
      Returns:
      the variable with the given name, or null if no such variable exists.
    • getVariableNames

      List<String> getVariableNames()

      getVariableNames.

      Specified by:
      getVariableNames in interface VariableSource
      Returns:
      (a copy of) the List of Variables for the data set, in the order of their columns.
    • getVariables

      List<Node> getVariables()

      getVariables.

      Specified by:
      getVariables in interface VariableSource
      Returns:
      (a copy of) the List of Variables for the data set, in the order of their columns.
    • isContinuous

      boolean isContinuous()

      isContinuous.

      Specified by:
      isContinuous in interface DataModel
      Returns:
      true if this is a continuous data set--that is, if it contains at least one column and all the columns are continuous.
    • isDiscrete

      boolean isDiscrete()

      isDiscrete.

      Specified by:
      isDiscrete in interface DataModel
      Returns:
      true if this is a discrete data set--that is, if it contains at least one column and all the columns are discrete.
    • isMixed

      boolean isMixed()

      isMixed.

      Specified by:
      isMixed in interface DataModel
      Returns:
      true if this is a continuous data set--that is, if it contains at least one continuous column and one discrete column.
    • isSelected

      boolean isSelected(Node variable)

      isSelected.

      Parameters:
      variable - The variable to check.
      Returns:
      true iff the given column has been marked as selected.
    • removeColumn

      void removeColumn(int index)
      Removes the variable (and data) at the given index.
      Parameters:
      index - The index of the variable to remove.
    • removeColumn

      void removeColumn(Node variable)
      Removes the given variable, along with all of its data.
      Parameters:
      variable - The variable to remove.
    • removeCols

      void removeCols(int[] selectedCols)
      Removes the given columns from the data set.
      Parameters:
      selectedCols - The indices of the columns to remove.
    • removeRows

      void removeRows(int[] selectedRows)
      Removes the given rows from the data set.
      Parameters:
      selectedRows - The indices of the rows to remove.
    • setDouble

      void setDouble(int row, int column, double value)
      Sets the value at the given (row, column) to the given double value, assuming the variable for the column is continuous.
      Parameters:
      row - The index of the case.
      column - The index of the variable.
      value - The value to set.
    • setInt

      void setInt(int row, int col, int value)
      Sets the value at the given (row, column) to the given int value, assuming the variable for the column is discrete.
      Parameters:
      row - The index of the case.
      col - The index of the variable.
      value - The value to set.
    • setObject

      void setObject(int row, int col, Object value)
      Sets the value at the given (row, column) to the given value.
      Parameters:
      row - The index of the case.
      col - The index of the variable.
      value - The value to set.
    • setSelected

      void setSelected(Node variable, boolean selected)
      Marks the given column as selected if 'selected' is true or deselected if 'selected' is false.
      Parameters:
      variable - The variable to select or deselect.
      selected - True to select the variable, false to deselect it.
    • subsetRowsColumns

      DataSet subsetRowsColumns(int[] rows, int[] columns)

      subsetRowsColumns.

      Parameters:
      rows - an array of int objects
      columns - an array of int objects
      Returns:
      a DataSet object
    • subsetColumns

      DataSet subsetColumns(List<Node> vars)
      Creates and returns a dataset consisting of those variables in the list vars. Vars must be a subset of the variables of this DataSet. The ordering of the elements of vars will be the same as in the list of variables in this DataSet.
      Parameters:
      vars - The variables to include in the new data set.
      Returns:
      a new data set consisting of the variables in the list vars.
    • subsetColumns

      DataSet subsetColumns(int[] columns)

      subsetColumns.

      Parameters:
      columns - The indices of the columns to include in the new data set.
      Returns:
      a new data set in which the column at indices[i] is placed at index i, for i = 0 to indices.length - 1. (View instead?)
    • subsetRows

      DataSet subsetRows(int[] rows)

      subsetRows.

      Parameters:
      rows - The indices of the rows to include in the new data set.
      Returns:
      a new data set in which the row at indices[i] is placed at index i, for i = 0 to indices.length - 1. (View instead?)
    • toString

      String toString()

      toString.

      Specified by:
      toString in interface DataModel
      Overrides:
      toString in class Object
      Returns:
      a string representation of this dataset.
    • getNumberFormat

      NumberFormat getNumberFormat()
      The number format of the dataset.
      Returns:
      The number format of the dataset.
    • setNumberFormat

      void setNumberFormat(NumberFormat nf)
      The number formatter used to print out continuous values.
      Parameters:
      nf - The number formatter used to print out continuous values.
    • setOutputDelimiter

      void setOutputDelimiter(Character character)
      The character used a delimiter when the dataset is output
      Parameters:
      character - The character used as a delimiter when the dataset is output
    • permuteRows

      void permuteRows()
      Randomizes the rows of the data set.
    • getColumnToTooltip

      Map<String,String> getColumnToTooltip()
      Returns the map of column names to tooltips.
      Returns:
      The map of column names to tooltips.
    • equals

      boolean equals(Object o)
      Checks if the given object is equal to this dataset.
      Overrides:
      equals in class Object
      Parameters:
      o - The object to check.
      Returns:
      True if the given object is equal to this dataset.
    • copy

      DataSet copy()
      Returns a copy of this dataset.
      Specified by:
      copy in interface DataModel
      Returns:
      A copy of this dataset.
    • like

      DataSet like()
      Returns a dataset with the same dimensions as this dataset, but with no data.
      Returns:
      a dataset with the same dimensions as this dataset, but with no data.