Package edu.cmu.tetrad.data
Class DataTransforms
java.lang.Object
edu.cmu.tetrad.data.DataTransforms
DataTransforms class.
- Version:
- $Id: $Id
- Author:
- josephramsey
-
Method Summary
Modifier and TypeMethodDescriptionstatic DataSetaddMissingData(DataSet inData, double[] probs) Adds missing data values to cases in accordance with probabilities specified in a double array which has as many elements as there are columns in the input dataset.static double[]center(double[] d) center.static DataSetSubtracts the mean of each column from each datum that column.center.static MatrixcenterData(Matrix data) centerData.static DataSetconcatenate(DataSet... dataSets) concatenate.static DataSetconcatenate(DataSet dataSet1, DataSet dataSet2) concatenate.static Matrixconcatenate(Matrix... dataSets) concatenate.static DataSetconcatenate(List<DataSet> dataSets) concatenate.static DataSetconvertNumericalDiscreteToContinuous.static voidcopyColumn(Node node, DataSet source, DataSet dest) copyColumn.static ICovarianceMatrixcovarianceNonparanormalDrton(DataSet dataSet) covarianceNonparanormalDrton.static DataSetdiscretize(DataSet dataSet, int numCategories, boolean variablesCopied) discretize.static DataSetgetBootstrapSample(DataSet data, int sampleSize) getBootstrapSample.static DataSetgetBootstrapSample(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled with replacement.static MatrixgetBootstrapSample(Matrix data, int sampleSize) getBootstrapSample.getConstantColumns(DataSet dataSet) getConstantColumns.static DataSetgetNonparanormalTransformed(DataSet dataSet) getNonparanormalTransformed.static DataSetgetResamplingDataset(DataSet data, int sampleSize) getResamplingDataset.static DataSetgetResamplingDataset(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled without replacement.static DataSetLog or unlog datastatic DataSetremoveConstantColumns(DataSet dataSet) removeConstantColumns.static DataSetremoveRandomColumns(DataSet dataSet, double aDouble) removeRandomColumns.static DataSetreplaceMissingWithRandom(DataSet inData) replaceMissingWithRandom.static DataSetrestrictToMeasured(DataSet fullDataSet) restrictToMeasured.static DataSetshuffleColumns(DataSet dataModel) shuffleColumns.shuffleColumns2(List<DataSet> dataSets) shuffleColumns2.split.static double[]standardizeData(double[] data) standardizeData.static cern.colt.list.DoubleArrayListstandardizeData(cern.colt.list.DoubleArrayList data) standardizeData.static DataSetstandardizeData(DataSet dataSet) standardizeData.static MatrixstandardizeData(Matrix data) standardizeData.standardizeData(List<DataSet> dataSets) standardizeData.
-
Method Details
-
logData
Log or unlog data -
standardizeData
standardizeData.
-
standardizeData
standardizeData.
-
center
center.
-
discretize
discretize.
-
convertNumericalDiscreteToContinuous
public static DataSet convertNumericalDiscreteToContinuous(DataSet dataSet) throws NumberFormatException convertNumericalDiscreteToContinuous.
- Parameters:
dataSet- aDataSetobject- Returns:
- a
DataSetobject - Throws:
NumberFormatException- if any.
-
concatenate
concatenate.
-
concatenate
concatenate.
-
concatenate
concatenate.
-
restrictToMeasured
restrictToMeasured.
-
getResamplingDataset
getResamplingDataset.
- Parameters:
data- aDataSetobjectsampleSize- a int- Returns:
- a sample without replacement with the given sample size from the given dataset.
-
getResamplingDataset
public static DataSet getResamplingDataset(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled without replacement.- Parameters:
data- original datasetsampleSize- number of data (row)randomGenerator- random number generator- Returns:
- dataset
-
getBootstrapSample
getBootstrapSample.
- Parameters:
data- aDataSetobjectsampleSize- a int- Returns:
- a sample with replacement with the given sample size from the given dataset.
-
getBootstrapSample
public static DataSet getBootstrapSample(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled with replacement.- Parameters:
data- original datasetsampleSize- number of data (row)randomGenerator- random number generator- Returns:
- dataset
-
split
split.
-
center
Subtracts the mean of each column from each datum that column. -
shuffleColumns
shuffleColumns.
-
shuffleColumns2
shuffleColumns2.
-
covarianceNonparanormalDrton
covarianceNonparanormalDrton.
- Parameters:
dataSet- aDataSetobject- Returns:
- a
ICovarianceMatrixobject
-
getNonparanormalTransformed
getNonparanormalTransformed.
-
removeConstantColumns
removeConstantColumns.
-
getConstantColumns
getConstantColumns.
-
removeRandomColumns
removeRandomColumns.
-
standardizeData
standardizeData.
-
standardizeData
public static double[] standardizeData(double[] data) standardizeData.
- Parameters:
data- an array ofdoubleobjects- Returns:
- an array of
doubleobjects
-
standardizeData
public static cern.colt.list.DoubleArrayList standardizeData(cern.colt.list.DoubleArrayList data) standardizeData.
- Parameters:
data- aDoubleArrayListobject- Returns:
- a
DoubleArrayListobject
-
center
public static double[] center(double[] d) center.
- Parameters:
d- an array ofdoubleobjects- Returns:
- an array of
doubleobjects
-
centerData
centerData.
-
concatenate
concatenate.
-
getBootstrapSample
getBootstrapSample.
- Parameters:
data- aMatrixobjectsampleSize- a int- Returns:
- a sample with replacement with the given sample size from the given dataset.
-
copyColumn
copyColumn.
-
addMissingData
Adds missing data values to cases in accordance with probabilities specified in a double array which has as many elements as there are columns in the input dataset. Hence, if the first element of the array of probabilities is alpha, then the first column will contain a -99 (or other missing value code) in a given case with probability alpha. This method will be useful in generating datasets which can be used to test algorithm that handle missing data and/or latent variables. Author: Frank Wimberly- Parameters:
inData- The data to which random missing data is to be added.probs- The probability of adding missing data to each column.- Returns:
- The new data sets with missing data added.
-
replaceMissingWithRandom
replaceMissingWithRandom.
-