Package edu.cmu.tetrad.data
Class DataTransforms
java.lang.Object
edu.cmu.tetrad.data.DataTransforms
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic DataSetaddMissingData(DataSet inData, double[] probs) Adds missing data values to cases in accordance with probabilities specified in a double array which has as many elements as there are columns in the input dataset.static double[]center(double[] d) static DataSetSubtracts the mean of each column from each datum that column.static MatrixcenterData(Matrix data) static DataSetconcatenate(DataSet... dataSets) static DataSetconcatenate(DataSet dataSet1, DataSet dataSet2) static Matrixconcatenate(Matrix... dataSets) static DataSetconcatenate(List<DataSet> dataSets) static DataSetstatic voidcopyColumn(Node node, DataSet source, DataSet dest) static ICovarianceMatrixcovarianceNonparanormalDrton(DataSet dataSet) static DataSetdiscretize(DataSet dataSet, int numCategories, boolean variablesCopied) static DataSetgetBootstrapSample(DataSet data, int sampleSize) static DataSetgetBootstrapSample(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled with replacement.static MatrixgetBootstrapSample(Matrix data, int sampleSize) getConstantColumns(DataSet dataSet) static DataSetgetNonparanormalTransformed(DataSet dataSet) static DataSetgetResamplingDataset(DataSet data, int sampleSize) static DataSetgetResamplingDataset(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled without replacement.static DataSetLog or unlog datastatic DataSetremoveConstantColumns(DataSet dataSet) static DataSetremoveRandomColumns(DataSet dataSet, double aDouble) static DataSetreplaceMissingWithRandom(DataSet inData) static DataSetrestrictToMeasured(DataSet fullDataSet) static DataSetshuffleColumns(DataSet dataModel) shuffleColumns2(List<DataSet> dataSets) static double[]standardizeData(double[] data) static cern.colt.list.DoubleArrayListstandardizeData(cern.colt.list.DoubleArrayList data) static DataSetstandardizeData(DataSet dataSet) static MatrixstandardizeData(Matrix data) standardizeData(List<DataSet> dataSets)
-
Constructor Details
-
DataTransforms
public DataTransforms()
-
-
Method Details
-
logData
Log or unlog data -
standardizeData
-
standardizeData
-
center
-
discretize
-
convertNumericalDiscreteToContinuous
public static DataSet convertNumericalDiscreteToContinuous(DataSet dataSet) throws NumberFormatException - Throws:
NumberFormatException
-
concatenate
-
concatenate
-
concatenate
-
restrictToMeasured
-
getResamplingDataset
- Returns:
- a sample without replacement with the given sample size from the given dataset.
-
getResamplingDataset
public static DataSet getResamplingDataset(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled without replacement.- Parameters:
data- original datasetsampleSize- number of data (row)randomGenerator- random number generator- Returns:
- dataset
-
getBootstrapSample
- Returns:
- a sample with replacement with the given sample size from the given dataset.
-
getBootstrapSample
public static DataSet getBootstrapSample(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled with replacement.- Parameters:
data- original datasetsampleSize- number of data (row)randomGenerator- random number generator- Returns:
- dataset
-
split
-
center
Subtracts the mean of each column from each datum that column. -
shuffleColumns
-
shuffleColumns2
-
covarianceNonparanormalDrton
-
getNonparanormalTransformed
-
removeConstantColumns
-
getConstantColumns
-
removeRandomColumns
-
standardizeData
-
standardizeData
public static double[] standardizeData(double[] data) -
standardizeData
public static cern.colt.list.DoubleArrayList standardizeData(cern.colt.list.DoubleArrayList data) -
center
public static double[] center(double[] d) -
centerData
-
concatenate
-
getBootstrapSample
- Returns:
- a sample with replacement with the given sample size from the given dataset.
-
copyColumn
-
addMissingData
Adds missing data values to cases in accordance with probabilities specified in a double array which has as many elements as there are columns in the input dataset. Hence, if the first element of the array of probabilities is alpha, then the first column will contain a -99 (or other missing value code) in a given case with probability alpha. This method will be useful in generating datasets which can be used to test algorithm that handle missing data and/or latent variables. Author: Frank Wimberly- Parameters:
inData- The data to which random missing data is to be added.probs- The probability of adding missing data to each column.- Returns:
- The new data sets with missing data added.
-
replaceMissingWithRandom
-