Class NumberObjectDataSet

java.lang.Object
edu.cmu.tetrad.data.NumberObjectDataSet
All Implemented Interfaces:
DataModel, DataSet, KnowledgeTransferable, VariableSource, TetradSerializable, Serializable

public final class NumberObjectDataSet extends Object implements DataSet
Wraps a 2D array of Number objects in such a way that mixed data sets can be stored. The type of each column must be specified by a Variable object, which must be either a ContinuousVariable or a DiscreteVariable. This class violates object orientation in that the underlying data matrix is retrievable using the getDoubleData() method. This is allowed so that external calculations may be performed on large datasets without having to allocate extra memory. If this matrix needs to be modified externally, please consider making a copy of it first, using the TetradMatrix copy() method.

The data set may be given a name; this name is not used internally.

The data set has a list of variables associated with it, as described above. This list is coordinated with the stored data, in that data for the i'th variable will be in the i'th column.

A subset of variables in the data set may be designated as selected. This selection set is stored with the data set and may be manipulated using the select and deselect methods.

// * A multiplicity m_i may be associated with each case c_i in the dataset, which // * is interpreted to mean that that c_i occurs m_i times in the dataset. // *

Knowledge may be associated with the data set, using the setKnowledge method. This knowledge is not used internally to the data set, but it may be retrieved by algorithm and used.

This data set replaces an earlier Minitab-style DataSet class. The reasons for replacement are as follows.

  • COLT marices are optimized for double 2D matrix calculations in ways that Java-style double[][] matrices are not.
  • The COLT library comes with a wide range of linear algebra library methods that are better tested and more flexible than that linear algebra methods used previously in Tetrad.
  • Views of COLT matrices can often be used in places where copies of data sets were being created.
  • The only place where data sets were being manipulated for honest reasons was in the interface; everywhere else, it turns out to have been sensible to calculate a list of variables and a sample size in advance and allocate memory for a data set with these dimensions. For very large data sets, it makes more sense to disallow memory-hogging manipulations than to throw out-of-memory errors.
Version:
$Id: $Id
Author:
josephramsey
See Also: