Interface DataSet

All Superinterfaces:
DataModel, KnowledgeTransferable, Serializable, TetradSerializable, VariableSource
All Known Implementing Classes:
BoxDataSet, NumberObjectDataSet

public interface DataSet extends DataModel
Implements a rectangular data set, in the sense of being a dataset with a fixed number of columns and a fixed number of rows, the length of each column being constant.
Author:
josephramsey
  • Field Details

  • Method Details

    • addVariable

      void addVariable(Node variable)
      Adds the given variable to the data set.
      Throws:
      IllegalArgumentException - if the variable is neither continuous nor discrete.
    • addVariable

      void addVariable(int index, Node variable)
      Adds the given variable at the given index.
    • changeVariable

      void changeVariable(Node from, Node to)
      Changes the variable for the given column from from to to. Supported currently only for discrete variables.
      Throws:
      IllegalArgumentException - if the given change is not supported.
    • clearSelection

      void clearSelection()
      Marks all variables as deselected.
    • ensureColumns

      void ensureColumns(int columns, List<String> excludedVariableNames)
      Ensures that the dataset has at least columns columns. Used for pasting data into the dataset. When creating new columns, names in the excludedVarialbeNames list may not be used. The purpose of this is to allow these names to be set later by the calling class, without incurring conflicts.
    • existsMissingValue

      boolean existsMissingValue()
      Returns true if and only if this data set contains at least one missing value.
    • ensureRows

      void ensureRows(int rows)
      Ensures that the dataset has at least rows rows. Used for pasting data into the dataset.
    • getColumn

      int getColumn(Node variable)
      Returns:
      the column index of the given variable.
    • getCorrelationMatrix

      Matrix getCorrelationMatrix()
      If this is a continuous data set, returns the correlation matrix.
      Throws:
      IllegalStateException - if this is not a continuous data set.
    • getCovarianceMatrix

      Matrix getCovarianceMatrix()
      If this is a continuous data set, returns the covariance matrix.
      Throws:
      IllegalStateException - if this is not a continuous data set.
    • getDouble

      double getDouble(int row, int column)
      Returns:
      the value at the given row and column as a double. For discrete data, returns the integer value cast to a double.
    • getDoubleData

      Matrix getDoubleData()
      Returns:
      the underlying data matrix as a TetradMatrix.
      Throws:
      IllegalStateException - if this is not a continuous data set.
    • getInt

      int getInt(int row, int column)
      Returns:
      the value at the given row and column as an int, rounding if necessary. For discrete variables, this returns the category index of the datum for the variable at that column. Returns DiscreteVariable.MISSING_VALUE for missing values.
    • getName

      String getName()
      Specified by:
      getName in interface DataModel
      Returns:
      the name of the data set.
    • getNumColumns

      int getNumColumns()
      Returns:
      the number of columns in the data set.
    • getNumRows

      int getNumRows()
      Returns:
      the number of rows in the data set.
    • getObject

      Object getObject(int row, int col)
      Parameters:
      row - The index of the case.
      col - The index of the variable.
      Returns:
      the value at the given row and column as an Object. The type returned is deliberately vague, allowing for variables of any type. Primitives will be returned as corresponding wrapping objects (for example, doubles as Doubles).
    • getSelectedIndices

      int[] getSelectedIndices()
      Returns:
      the currently selected variables.
    • getVariable

      Node getVariable(int column)
      Returns:
      the variable at the given column.
    • getVariable

      Node getVariable(String name)
      Specified by:
      getVariable in interface DataModel
      Returns:
      the variable with the given name.
    • getVariableNames

      List<String> getVariableNames()
      Description copied from interface: VariableSource
      Returns the variable names associated with this getVariableNames.
      Specified by:
      getVariableNames in interface VariableSource
      Returns:
      (a copy of) the List of Variables for the data set, in the order of their columns.
    • getVariables

      List<Node> getVariables()
      Description copied from interface: VariableSource
      Returns the list of variables associated with this object.
      Specified by:
      getVariables in interface VariableSource
      Returns:
      (a copy of) the List of Variables for the data set, in the order of their columns.
    • isContinuous

      boolean isContinuous()
      Specified by:
      isContinuous in interface DataModel
      Returns:
      true if this is a continuous data set--that is, if it contains at least one column and all of the columns are continuous.
    • isDiscrete

      boolean isDiscrete()
      Specified by:
      isDiscrete in interface DataModel
      Returns:
      true if this is a discrete data set--that is, if it contains at least one column and all of the columns are discrete.
    • isMixed

      boolean isMixed()
      Specified by:
      isMixed in interface DataModel
      Returns:
      true if this is a continuous data set--that is, if it contains at least one continuous column and one discrete columnn.
    • isSelected

      boolean isSelected(Node variable)
      Returns:
      true iff the given column has been marked as selected.
    • removeColumn

      void removeColumn(int index)
      Removes the variable (and data) at the given index.
    • removeColumn

      void removeColumn(Node variable)
      Removes the given variable, along with all of its data.
    • removeCols

      void removeCols(int[] selectedCols)
      Removes the given columns from the data set.
    • removeRows

      void removeRows(int[] selectedRows)
      Removes the given rows from the data set.
    • setDouble

      void setDouble(int row, int column, double value)
      Sets the value at the given (row, column) to the given double value, assuming the variable for the column is continuous.
      Parameters:
      row - The index of the case.
      column - The index of the variable.
    • setInt

      void setInt(int row, int col, int value)
      Sets the value at the given (row, column) to the given int value, assuming the variable for the column is discrete.
      Parameters:
      row - The index of the case.
      col - The index of the variable.
    • setObject

      void setObject(int row, int col, Object value)
      Sets the value at the given (row, column) to the given value.
      Parameters:
      row - The index of the case.
      col - The index of the variable.
    • setSelected

      void setSelected(Node variable, boolean selected)
      Marks the given column as selected if 'selected' is true or deselected if 'selected' is false.
    • subsetRowsColumns

      DataSet subsetRowsColumns(int[] rows, int[] columns)
    • subsetColumns

      DataSet subsetColumns(List<Node> vars)
      Creates and returns a dataset consisting of those variables in the list vars. Vars must be a subset of the variables of this DataSet. The ordering of the elements of vars will be the same as in the list of variables in this DataSet.
    • subsetColumns

      DataSet subsetColumns(int[] columns)
      Returns:
      a new data set in which the the column at indices[i] is placed at index i, for i = 0 to indices.length - 1. (View instead?)
    • subsetRows

      DataSet subsetRows(int[] rows)
      Returns:
      a new data set in which the the row at indices[i] is placed at index i, for i = 0 to indices.length - 1. (View instead?)
    • toString

      String toString()
      Description copied from interface: DataModel
      Renders the data model as as String.
      Specified by:
      toString in interface DataModel
      Overrides:
      toString in class Object
      Returns:
      a string representation of this dataset.
    • getNumberFormat

      NumberFormat getNumberFormat()
      The number format of the dataset.
    • setNumberFormat

      void setNumberFormat(NumberFormat nf)
      The number formatter used to print out continuous values.
    • setOutputDelimiter

      void setOutputDelimiter(Character character)
      The character used a delimiter when the dataset is output.
    • permuteRows

      void permuteRows()
      Randomizes the rows of the data set.
    • getColumnToTooltip

      Map<String,String> getColumnToTooltip()
    • equals

      boolean equals(Object o)
      Overrides:
      equals in class Object
    • copy

      DataSet copy()
      Specified by:
      copy in interface DataModel
    • like

      DataSet like()