Package edu.cmu.tetrad.data
Class DataTransforms
java.lang.Object
edu.cmu.tetrad.data.DataTransforms
DataTransforms class.
- Version:
- $Id: $Id
- Author:
- josephramsey
-
Method Summary
Modifier and TypeMethodDescriptionstatic DataSet
addMissingData
(DataSet inData, double[] probs) Adds missing data values to cases in accordance with probabilities specified in a double array which has as many elements as there are columns in the input dataset.static double[]
center
(double[] d) center.static DataSet
Subtracts the mean of each column from each datum that column.center.static Matrix
centerData
(Matrix data) centerData.static DataSet
concatenate
(DataSet... dataSets) concatenate.static DataSet
concatenate
(DataSet dataSet1, DataSet dataSet2) concatenate.static Matrix
concatenate
(Matrix... dataSets) concatenate.static DataSet
concatenate
(List<DataSet> dataSets) concatenate.static DataSet
convertNumericalDiscreteToContinuous.static void
copyColumn
(Node node, DataSet source, DataSet dest) copyColumn.static ICovarianceMatrix
covarianceNonparanormalDrton
(DataSet dataSet) covarianceNonparanormalDrton.static DataSet
discretize
(DataSet dataSet, int numCategories, boolean variablesCopied) discretize.static DataSet
getBootstrapSample
(DataSet data, int sampleSize) getBootstrapSample.static DataSet
getBootstrapSample
(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled with replacement.static Matrix
getBootstrapSample
(Matrix data, int sampleSize) getBootstrapSample.getConstantColumns
(DataSet dataSet) getConstantColumns.static DataSet
getNonparanormalTransformed
(DataSet dataSet) getNonparanormalTransformed.static DataSet
getResamplingDataset
(DataSet data, int sampleSize) getResamplingDataset.static DataSet
getResamplingDataset
(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled without replacement.static DataSet
Log or unlog datastatic DataSet
removeConstantColumns
(DataSet dataSet) removeConstantColumns.static DataSet
removeRandomColumns
(DataSet dataSet, double aDouble) removeRandomColumns.static DataSet
replaceMissingWithRandom
(DataSet inData) replaceMissingWithRandom.static DataSet
restrictToMeasured
(DataSet fullDataSet) restrictToMeasured.static DataSet
shuffleColumns
(DataSet dataModel) shuffleColumns.shuffleColumns2
(List<DataSet> dataSets) shuffleColumns2.split.static double[]
standardizeData
(double[] data) standardizeData.static cern.colt.list.DoubleArrayList
standardizeData
(cern.colt.list.DoubleArrayList data) standardizeData.static DataSet
standardizeData
(DataSet dataSet) standardizeData.static Matrix
standardizeData
(Matrix data) standardizeData.standardizeData
(List<DataSet> dataSets) standardizeData.
-
Method Details
-
logData
Log or unlog data -
standardizeData
standardizeData.
-
standardizeData
standardizeData.
-
center
center.
-
discretize
discretize.
-
convertNumericalDiscreteToContinuous
public static DataSet convertNumericalDiscreteToContinuous(DataSet dataSet) throws NumberFormatException convertNumericalDiscreteToContinuous.
- Parameters:
dataSet
- aDataSet
object- Returns:
- a
DataSet
object - Throws:
NumberFormatException
- if any.
-
concatenate
concatenate.
-
concatenate
concatenate.
-
concatenate
concatenate.
-
restrictToMeasured
restrictToMeasured.
-
getResamplingDataset
getResamplingDataset.
- Parameters:
data
- aDataSet
objectsampleSize
- a int- Returns:
- a sample without replacement with the given sample size from the given dataset.
-
getResamplingDataset
public static DataSet getResamplingDataset(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled without replacement.- Parameters:
data
- original datasetsampleSize
- number of data (row)randomGenerator
- random number generator- Returns:
- dataset
-
getBootstrapSample
getBootstrapSample.
- Parameters:
data
- aDataSet
objectsampleSize
- a int- Returns:
- a sample with replacement with the given sample size from the given dataset.
-
getBootstrapSample
public static DataSet getBootstrapSample(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled with replacement.- Parameters:
data
- original datasetsampleSize
- number of data (row)randomGenerator
- random number generator- Returns:
- dataset
-
split
split.
-
center
Subtracts the mean of each column from each datum that column. -
shuffleColumns
shuffleColumns.
-
shuffleColumns2
shuffleColumns2.
-
covarianceNonparanormalDrton
covarianceNonparanormalDrton.
- Parameters:
dataSet
- aDataSet
object- Returns:
- a
ICovarianceMatrix
object
-
getNonparanormalTransformed
getNonparanormalTransformed.
-
removeConstantColumns
removeConstantColumns.
-
getConstantColumns
getConstantColumns.
-
removeRandomColumns
removeRandomColumns.
-
standardizeData
standardizeData.
-
standardizeData
public static double[] standardizeData(double[] data) standardizeData.
- Parameters:
data
- an array ofdouble
objects- Returns:
- an array of
double
objects
-
standardizeData
public static cern.colt.list.DoubleArrayList standardizeData(cern.colt.list.DoubleArrayList data) standardizeData.
- Parameters:
data
- aDoubleArrayList
object- Returns:
- a
DoubleArrayList
object
-
center
public static double[] center(double[] d) center.
- Parameters:
d
- an array ofdouble
objects- Returns:
- an array of
double
objects
-
centerData
centerData.
-
concatenate
concatenate.
-
getBootstrapSample
getBootstrapSample.
- Parameters:
data
- aMatrix
objectsampleSize
- a int- Returns:
- a sample with replacement with the given sample size from the given dataset.
-
copyColumn
copyColumn.
-
addMissingData
Adds missing data values to cases in accordance with probabilities specified in a double array which has as many elements as there are columns in the input dataset. Hence, if the first element of the array of probabilities is alpha, then the first column will contain a -99 (or other missing value code) in a given case with probability alpha. This method will be useful in generating datasets which can be used to test algorithm that handle missing data and/or latent variables. Author: Frank Wimberly- Parameters:
inData
- The data to which random missing data is to be added.probs
- The probability of adding missing data to each column.- Returns:
- The new data sets with missing data added.
-
replaceMissingWithRandom
replaceMissingWithRandom.
-