Class DataTransforms

java.lang.Object
edu.cmu.tetrad.data.DataTransforms

public class DataTransforms extends Object

DataTransforms class.

Version:
$Id: $Id
Author:
josephramsey
  • Method Details

    • logData

      public static DataSet logData(DataSet dataSet, double a, boolean isUnlog, int base)
      Log or unlog data
      Parameters:
      dataSet - a DataSet object
      a - a double
      isUnlog - a boolean
      base - a int
      Returns:
      a DataSet object
    • standardizeData

      public static List<DataSet> standardizeData(List<DataSet> dataSets)

      standardizeData.

      Parameters:
      dataSets - a List object
      Returns:
      a List object
    • standardizeData

      public static DataSet standardizeData(DataSet dataSet)

      standardizeData.

      Parameters:
      dataSet - a DataSet object
      Returns:
      a DataSet object
    • center

      public static List<DataSet> center(List<DataSet> dataList)

      center.

      Parameters:
      dataList - a List object
      Returns:
      a List object
    • discretize

      public static DataSet discretize(DataSet dataSet, int numCategories, boolean variablesCopied)

      discretize.

      Parameters:
      dataSet - a DataSet object
      numCategories - a int
      variablesCopied - a boolean
      Returns:
      a DataSet object
    • convertNumericalDiscreteToContinuous

      public static DataSet convertNumericalDiscreteToContinuous(DataSet dataSet) throws NumberFormatException

      convertNumericalDiscreteToContinuous.

      Parameters:
      dataSet - a DataSet object
      Returns:
      a DataSet object
      Throws:
      NumberFormatException - if any.
    • concatenate

      public static DataSet concatenate(DataSet dataSet1, DataSet dataSet2)

      concatenate.

      Parameters:
      dataSet1 - a DataSet object
      dataSet2 - a DataSet object
      Returns:
      a DataSet object
    • concatenate

      public static DataSet concatenate(DataSet... dataSets)

      concatenate.

      Parameters:
      dataSets - a DataSet object
      Returns:
      a DataSet object
    • concatenate

      public static DataSet concatenate(List<DataSet> dataSets)

      concatenate.

      Parameters:
      dataSets - a List object
      Returns:
      a DataSet object
    • restrictToMeasured

      public static DataSet restrictToMeasured(DataSet fullDataSet)

      restrictToMeasured.

      Parameters:
      fullDataSet - a DataSet object
      Returns:
      a DataSet object
    • getResamplingDataset

      public static DataSet getResamplingDataset(DataSet data, int sampleSize)

      getResamplingDataset.

      Parameters:
      data - a DataSet object
      sampleSize - a int
      Returns:
      a sample without replacement with the given sample size from the given dataset.
    • getResamplingDataset

      public static DataSet getResamplingDataset(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator)
      Get dataset sampled without replacement.
      Parameters:
      data - original dataset
      sampleSize - number of data (row)
      randomGenerator - random number generator
      Returns:
      dataset
    • getBootstrapSample

      public static DataSet getBootstrapSample(DataSet data, int sampleSize)

      getBootstrapSample.

      Parameters:
      data - a DataSet object
      sampleSize - a int
      Returns:
      a sample with replacement with the given sample size from the given dataset.
    • getBootstrapSample

      public static DataSet getBootstrapSample(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator)
      Get dataset sampled with replacement.
      Parameters:
      data - original dataset
      sampleSize - number of data (row)
      randomGenerator - random number generator
      Returns:
      dataset
    • split

      public static List<DataSet> split(DataSet data, double percentTest)

      split.

      Parameters:
      data - a DataSet object
      percentTest - a double
      Returns:
      a List object
    • center

      public static DataSet center(DataSet data)
      Subtracts the mean of each column from each datum that column.
      Parameters:
      data - a DataSet object
      Returns:
      a DataSet object
    • shuffleColumns

      public static DataSet shuffleColumns(DataSet dataModel)

      shuffleColumns.

      Parameters:
      dataModel - a DataSet object
      Returns:
      a DataSet object
    • shuffleColumns2

      public static List<DataSet> shuffleColumns2(List<DataSet> dataSets)

      shuffleColumns2.

      Parameters:
      dataSets - a List object
      Returns:
      a List object
    • covarianceNonparanormalDrton

      public static ICovarianceMatrix covarianceNonparanormalDrton(DataSet dataSet)

      covarianceNonparanormalDrton.

      Parameters:
      dataSet - a DataSet object
      Returns:
      a ICovarianceMatrix object
    • getNonparanormalTransformed

      public static DataSet getNonparanormalTransformed(DataSet dataSet)

      getNonparanormalTransformed.

      Parameters:
      dataSet - a DataSet object
      Returns:
      a DataSet object
    • removeConstantColumns

      public static DataSet removeConstantColumns(DataSet dataSet)

      removeConstantColumns.

      Parameters:
      dataSet - a DataSet object
      Returns:
      a DataSet object
    • getConstantColumns

      public static List<Node> getConstantColumns(DataSet dataSet)

      getConstantColumns.

      Parameters:
      dataSet - a DataSet object
      Returns:
      a List object
    • removeRandomColumns

      public static DataSet removeRandomColumns(DataSet dataSet, double aDouble)

      removeRandomColumns.

      Parameters:
      dataSet - a DataSet object
      aDouble - a double
      Returns:
      a DataSet object
    • standardizeData

      public static Matrix standardizeData(Matrix data)

      standardizeData.

      Parameters:
      data - a Matrix object
      Returns:
      a Matrix object
    • standardizeData

      public static double[] standardizeData(double[] data)

      standardizeData.

      Parameters:
      data - an array of double objects
      Returns:
      an array of double objects
    • standardizeData

      public static cern.colt.list.DoubleArrayList standardizeData(cern.colt.list.DoubleArrayList data)

      standardizeData.

      Parameters:
      data - a DoubleArrayList object
      Returns:
      a DoubleArrayList object
    • center

      public static double[] center(double[] d)

      center.

      Parameters:
      d - an array of double objects
      Returns:
      an array of double objects
    • centerData

      public static Matrix centerData(Matrix data)

      centerData.

      Parameters:
      data - a Matrix object
      Returns:
      a Matrix object
    • concatenate

      public static Matrix concatenate(Matrix... dataSets)

      concatenate.

      Parameters:
      dataSets - a Matrix object
      Returns:
      a Matrix object
    • getBootstrapSample

      public static Matrix getBootstrapSample(Matrix data, int sampleSize)

      getBootstrapSample.

      Parameters:
      data - a Matrix object
      sampleSize - a int
      Returns:
      a sample with replacement with the given sample size from the given dataset.
    • copyColumn

      public static void copyColumn(Node node, DataSet source, DataSet dest)

      copyColumn.

      Parameters:
      node - a Node object
      source - a DataSet object
      dest - a DataSet object
    • addMissingData

      public static DataSet addMissingData(DataSet inData, double[] probs)
      Adds missing data values to cases in accordance with probabilities specified in a double array which has as many elements as there are columns in the input dataset. Hence, if the first element of the array of probabilities is alpha, then the first column will contain a -99 (or other missing value code) in a given case with probability alpha. This method will be useful in generating datasets which can be used to test algorithm that handle missing data and/or latent variables. Author: Frank Wimberly
      Parameters:
      inData - The data to which random missing data is to be added.
      probs - The probability of adding missing data to each column.
      Returns:
      The new data sets with missing data added.
    • replaceMissingWithRandom

      public static DataSet replaceMissingWithRandom(DataSet inData)

      replaceMissingWithRandom.

      Parameters:
      inData - a DataSet object
      Returns:
      a DataSet object