Class DataTransforms

java.lang.Object
edu.cmu.tetrad.data.DataTransforms

public class DataTransforms extends Object

DataTransforms class.

Version:
$Id: $Id
Author:
josephramsey
  • Method Details

    • logData

      public static DataSet logData(DataSet dataSet, double a, boolean isUnlog, int base)
      Log or unlog data
      Parameters:
      dataSet - a DataSet object
      a - a double
      isUnlog - a boolean
      base - a int
      Returns:
      a DataSet object
    • standardizeData

      public static List<DataSet> standardizeData(List<DataSet> dataSets)

      standardizeData.

      Parameters:
      dataSets - a List object
      Returns:
      a List object
    • standardizeData

      public static DataSet standardizeData(DataSet dataSet)

      standardizeData.

      Parameters:
      dataSet - a DataSet object
      Returns:
      a DataSet object
    • center

      public static List<DataSet> center(List<DataSet> dataList)

      center.

      Parameters:
      dataList - a List object
      Returns:
      a List object
    • discretize

      public static DataSet discretize(DataSet dataSet, int numCategories, boolean variablesCopied)

      discretize.

      Parameters:
      dataSet - a DataSet object
      numCategories - a int
      variablesCopied - a boolean
      Returns:
      a DataSet object
    • convertNumericalDiscreteToContinuous

      public static DataSet convertNumericalDiscreteToContinuous(DataSet dataSet) throws NumberFormatException

      convertNumericalDiscreteToContinuous.

      Parameters:
      dataSet - a DataSet object
      Returns:
      a DataSet object
      Throws:
      NumberFormatException - if any.
    • concatenate

      public static DataSet concatenate(DataSet dataSet1, DataSet dataSet2)

      concatenate.

      Parameters:
      dataSet1 - a DataSet object
      dataSet2 - a DataSet object
      Returns:
      a DataSet object
    • concatenate

      public static DataSet concatenate(DataSet... dataSets)

      concatenate.

      Parameters:
      dataSets - a DataSet object
      Returns:
      a DataSet object
    • concatenate

      public static DataSet concatenate(List<DataSet> dataSets)

      concatenate.

      Parameters:
      dataSets - a List object
      Returns:
      a DataSet object
    • restrictToMeasured

      public static DataSet restrictToMeasured(DataSet fullDataSet)

      restrictToMeasured.

      Parameters:
      fullDataSet - a DataSet object
      Returns:
      a DataSet object
    • getResamplingDataset

      public static DataSet getResamplingDataset(DataSet data, int sampleSize)

      getResamplingDataset.

      Parameters:
      data - a DataSet object
      sampleSize - a int
      Returns:
      a sample without replacement with the given sample size from the given dataset.
    • getResamplingDataset

      public static DataSet getResamplingDataset(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator)
      Get dataset sampled without replacement.
      Parameters:
      data - original dataset
      sampleSize - number of data (row)
      randomGenerator - random number generator
      Returns:
      dataset
    • getBootstrapSample

      public static DataSet getBootstrapSample(DataSet data, int sampleSize)

      getBootstrapSample.

      Parameters:
      data - a DataSet object
      sampleSize - a int
      Returns:
      a sample with replacement with the given sample size from the given dataset.
    • getBootstrapSample

      public static DataSet getBootstrapSample(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator)
      Get dataset sampled with replacement.
      Parameters:
      data - original dataset
      sampleSize - number of data (row)
      randomGenerator - random number generator
      Returns:
      dataset
    • split

      public static List<DataSet> split(DataSet data, double percentTest)

      split.

      Parameters:
      data - a DataSet object
      percentTest - a double
      Returns:
      a List object
    • center

      public static DataSet center(DataSet data)
      Subtracts the mean of each column from each datum that column.
      Parameters:
      data - a DataSet object
      Returns:
      a DataSet object
    • shuffleColumns

      public static DataSet shuffleColumns(DataSet dataModel)

      shuffleColumns.

      Parameters:
      dataModel - a DataSet object
      Returns:
      a DataSet object
    • shuffleColumns2

      public static List<DataSet> shuffleColumns2(List<DataSet> dataSets)

      shuffleColumns2.

      Parameters:
      dataSets - a List object
      Returns:
      a List object
    • covarianceNonparanormalDrton

      public static ICovarianceMatrix covarianceNonparanormalDrton(DataSet dataSet)

      covarianceNonparanormalDrton.

      Parameters:
      dataSet - a DataSet object
      Returns:
      a ICovarianceMatrix object
    • getNonparanormalTransformed

      public static DataSet getNonparanormalTransformed(DataSet dataSet)

      getNonparanormalTransformed.

      Parameters:
      dataSet - a DataSet object
      Returns:
      a DataSet object
    • removeConstantColumns

      public static DataSet removeConstantColumns(DataSet dataSet)

      removeConstantColumns.

      Parameters:
      dataSet - a DataSet object
      Returns:
      a DataSet object
    • getConstantColumns

      public static List<Node> getConstantColumns(DataSet dataSet)

      getConstantColumns.

      Parameters:
      dataSet - a DataSet object
      Returns:
      a List object
    • removeRandomColumns

      public static DataSet removeRandomColumns(DataSet dataSet, double aDouble)

      removeRandomColumns.

      Parameters:
      dataSet - a DataSet object
      aDouble - a double
      Returns:
      a DataSet object
    • standardizeData

      public static Matrix standardizeData(Matrix data)

      standardizeData.

      Parameters:
      data - a Matrix object
      Returns:
      a Matrix object
    • standardizeData

      public static Matrix standardizeData(Matrix data, List<Node> variables)
      Standardizes the columns of the given data matrix by centering and scaling. For each column representing a continuous variable, the method calculates the mean and standard deviation, subtracts the mean from each value, and divides by the standard deviation. Discrete variables are ignored.
      Parameters:
      data - The input data matrix to be standardized. Each column corresponds to a variable, and each row represents an observation.
      variables - A list of nodes representing the variables in the data. The type of each variable (e.g., continuous or discrete) determines whether the variable will be standardized.
      Returns:
      A new standardized data matrix where each continuous variable has been mean-centered and normalized by its standard deviation.
    • standardizeData

      public static double[] standardizeData(double[] data)

      standardizeData.

      Parameters:
      data - an array of objects
      Returns:
      an array of objects
    • center

      public static double[] center(double[] d)
      Centers the values in the given array by subtracting the mean of the array from each element.
      Parameters:
      d - the array of double values to be centered
      Returns:
      a new array where each element is the original value minus the mean of the input array
    • centerData

      public static Matrix centerData(Matrix data)

      centerData.

      Parameters:
      data - a Matrix object
      Returns:
      a Matrix object
    • concatenate

      public static Matrix concatenate(Matrix... dataSets)

      concatenate.

      Parameters:
      dataSets - a Matrix object
      Returns:
      a Matrix object
    • getBootstrapSample

      public static Matrix getBootstrapSample(Matrix data, int sampleSize)

      getBootstrapSample.

      Parameters:
      data - a Matrix object
      sampleSize - a int
      Returns:
      a sample with replacement with the given sample size from the given dataset.
    • copyColumn

      public static void copyColumn(Node node, DataSet source, DataSet dest)

      copyColumn.

      Parameters:
      node - a Node object
      source - a DataSet object
      dest - a DataSet object
    • addMissingData

      public static DataSet addMissingData(DataSet inData, double[] probs)
      Adds missing data values to cases in accordance with probabilities specified in a double array which has as many elements as there are columns in the input dataset. Hence, if the first element of the array of probabilities is alpha, then the first column will contain a -99 (or other missing value code) in a given case with probability alpha. This method will be useful in generating datasets which can be used to test algorithm that handle missing data and/or latent variables. Author: Frank Wimberly
      Parameters:
      inData - The data to which random missing data is to be added.
      probs - The probability of adding missing data to each column.
      Returns:
      The new data sets with missing data added.
    • replaceMissingWithRandom

      public static DataSet replaceMissingWithRandom(DataSet inData)

      replaceMissingWithRandom.

      Parameters:
      inData - a DataSet object
      Returns:
      a DataSet object
    • scale

      public static DataSet scale(DataSet dataSet, double scaleMin, double scaleMax)
      Scales the continuous variables in the given DataSet to have values in the range [-1, 1].

      For each continuous column, the method computes the maximum of the absolute values of the minimum and maximum of the column, and divides all values in that column by this maximum value. Discrete columns are not affected.

      Parameters:
      dataSet - The DataSet containing variables to be scaled.
      scaleMin - The minimum value to scale to.
      scaleMax - The maximum value to scale to.
      Returns:
      A new DataSet with scaled continuous variables, while discrete variables remain unchanged.
    • scale

      public static void scale(DataSet dataSet, double scaleMin, double scaleMax, Node node)
      Scales the values of a specified node in the given dataset to a specified range [scaleMin, scaleMax]. This method only processes nodes that are instances of ContinuousVariable.
      Parameters:
      dataSet - the dataset containing the values to be scaled
      scaleMin - the minimum value of the target range
      scaleMax - the maximum value of the target range
      node - the node corresponding to the column in the dataset to be scaled
    • scale

      public static double scale(double value, double dataMin, double dataMax, double scaleMin, double scaleMax)
      Scales a value from one range to another.
      Parameters:
      value - The value to scale
      dataMin - The minimum value of the data range
      dataMax - The maximum value of the data range
      scaleMin - The minimum value of the scale range
      scaleMax - The maximum value of the scale range
      Returns:
      The scaled value
      Throws:
      IllegalArgumentException - if dataMin is equal to dataMax
    • scale

      public static DataSet scale(DataSet dataSet, double[] scales)
      Scales the columns of the provided dataset based on the given scale factors. Only continuous variables in the dataset are scaled. Discrete variables are ignored. The method returns a new dataset with scaled values, leaving the original dataset unmodified.
      Parameters:
      dataSet - the input dataset to be scaled
      scales - an array of scale factors, where each scale corresponds to a column in the dataset
      Returns:
      a new dataset with the continuous columns scaled by the given factors