Class DataTransforms

java.lang.Object
edu.cmu.tetrad.data.DataTransforms

public class DataTransforms extends Object
  • Constructor Details

    • DataTransforms

      public DataTransforms()
  • Method Details

    • logData

      public static DataSet logData(DataSet dataSet, double a, boolean isUnlog, int base)
      Log or unlog data
    • standardizeData

      public static List<DataSet> standardizeData(List<DataSet> dataSets)
    • standardizeData

      public static DataSet standardizeData(DataSet dataSet)
    • center

      public static List<DataSet> center(List<DataSet> dataList)
    • discretize

      public static DataSet discretize(DataSet dataSet, int numCategories, boolean variablesCopied)
    • convertNumericalDiscreteToContinuous

      public static DataSet convertNumericalDiscreteToContinuous(DataSet dataSet) throws NumberFormatException
      Throws:
      NumberFormatException
    • concatenate

      public static DataSet concatenate(DataSet dataSet1, DataSet dataSet2)
    • concatenate

      public static DataSet concatenate(DataSet... dataSets)
    • concatenate

      public static DataSet concatenate(List<DataSet> dataSets)
    • restrictToMeasured

      public static DataSet restrictToMeasured(DataSet fullDataSet)
    • getResamplingDataset

      public static DataSet getResamplingDataset(DataSet data, int sampleSize)
      Returns:
      a sample without replacement with the given sample size from the given dataset.
    • getResamplingDataset

      public static DataSet getResamplingDataset(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator)
      Get dataset sampled without replacement.
      Parameters:
      data - original dataset
      sampleSize - number of data (row)
      randomGenerator - random number generator
      Returns:
      dataset
    • getBootstrapSample

      public static DataSet getBootstrapSample(DataSet data, int sampleSize)
      Returns:
      a sample with replacement with the given sample size from the given dataset.
    • getBootstrapSample

      public static DataSet getBootstrapSample(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator)
      Get dataset sampled with replacement.
      Parameters:
      data - original dataset
      sampleSize - number of data (row)
      randomGenerator - random number generator
      Returns:
      dataset
    • split

      public static List<DataSet> split(DataSet data, double percentTest)
    • center

      public static DataSet center(DataSet data)
      Subtracts the mean of each column from each datum that column.
    • shuffleColumns

      public static DataSet shuffleColumns(DataSet dataModel)
    • shuffleColumns2

      public static List<DataSet> shuffleColumns2(List<DataSet> dataSets)
    • covarianceNonparanormalDrton

      public static ICovarianceMatrix covarianceNonparanormalDrton(DataSet dataSet)
    • getNonparanormalTransformed

      public static DataSet getNonparanormalTransformed(DataSet dataSet)
    • removeConstantColumns

      public static DataSet removeConstantColumns(DataSet dataSet)
    • getConstantColumns

      public static List<Node> getConstantColumns(DataSet dataSet)
    • removeRandomColumns

      public static DataSet removeRandomColumns(DataSet dataSet, double aDouble)
    • standardizeData

      public static Matrix standardizeData(Matrix data)
    • standardizeData

      public static double[] standardizeData(double[] data)
    • standardizeData

      public static cern.colt.list.DoubleArrayList standardizeData(cern.colt.list.DoubleArrayList data)
    • center

      public static double[] center(double[] d)
    • centerData

      public static Matrix centerData(Matrix data)
    • concatenate

      public static Matrix concatenate(Matrix... dataSets)
    • getBootstrapSample

      public static Matrix getBootstrapSample(Matrix data, int sampleSize)
      Returns:
      a sample with replacement with the given sample size from the given dataset.
    • copyColumn

      public static void copyColumn(Node node, DataSet source, DataSet dest)
    • addMissingData

      public static DataSet addMissingData(DataSet inData, double[] probs)
      Adds missing data values to cases in accordance with probabilities specified in a double array which has as many elements as there are columns in the input dataset. Hence, if the first element of the array of probabilities is alpha, then the first column will contain a -99 (or other missing value code) in a given case with probability alpha. This method will be useful in generating datasets which can be used to test algorithm that handle missing data and/or latent variables. Author: Frank Wimberly
      Parameters:
      inData - The data to which random missing data is to be added.
      probs - The probability of adding missing data to each column.
      Returns:
      The new data sets with missing data added.
    • replaceMissingWithRandom

      public static DataSet replaceMissingWithRandom(DataSet inData)