Package edu.cmu.tetrad.data
Class DataUtils
java.lang.Object
edu.cmu.tetrad.data.DataUtils
Some static utility methods for dealing with data sets.
- Author:
- Various folks.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic DataSetaddMissingData(DataSet inData, double[] probs) Adds missing data values to cases in accordance with probabilities specified in a double array which has as many elements as there are columns in the input dataset.static double[]center(double[] d) static DataSetSubtracts the mean of each column from each datum that column.static MatrixcenterData(Matrix data) static DataSetstatic DataSetconcatenate(DataSet... dataSets) static DataSetconcatenate(DataSet dataSet1, DataSet dataSet2) static Matrixconcatenate(Matrix... dataSets) static DataSetconcatenate(List<DataSet> dataSets) static booleancontainsMissingValue(DataSet data) static booleancontainsMissingValue(Matrix data) static DataSetstatic voidcopyColumn(Node node, DataSet source, DataSet dest) static Matrixstatic ICovarianceMatrixcovarianceNonparanormalDrton(DataSet dataSet) createContinuousVariables(String[] varNames) static StringdefaultCategory(int index) static DataSetA discrete data set used to construct some other serializable instances.static DataSetdiscretize(DataSet dataSet, int numCategories, boolean variablesCopied) static DataSetgetBootstrapSample(DataSet data, int sampleSize) static DataSetgetBootstrapSample(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled with replacement.static MatrixgetBootstrapSample(Matrix data, int sampleSize) static doublegetEss(ICovarianceMatrix covariances) Returns the equivalent sample size, assuming all units are equally correlated and all unit variances are equal.static DataSetgetNonparanormalTransformed(DataSet dataSet) static DataSetgetResamplingDataset(DataSet data, int sampleSize) static DataSetgetResamplingDataset(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled without replacement.static booleanStates whether the given column of the given data set is binary.static DataSetLog or unlog datastatic Vectorstatic Vectormeans(double[][] data) Column major data.static Vectorstatic DataSetremoveConstantColumns(DataSet dataSet) static DataSetreplaceMissingWithRandom(DataSet inData) static DataSetrestrictToMeasured(DataSet fullDataSet) static DataSetshuffleColumns(DataSet dataModel) shuffleColumns2(List<DataSet> dataSets) static double[]standardizeData(double[] data) static cern.colt.list.DoubleArrayListstandardizeData(cern.colt.list.DoubleArrayList data) static DataSetstandardizeData(DataSet dataSet) static MatrixstandardizeData(Matrix data) standardizeData(List<DataSet> dataSets) static Matrixstatic Matrixstatic Matrixstatic Matrix
-
Constructor Details
-
DataUtils
public DataUtils()
-
-
Method Details
-
copyColumn
-
isBinary
States whether the given column of the given data set is binary.- Parameters:
data- Ibid.column- Ibid.- Returns:
- true iff the column is binary.
-
defaultCategory
- Parameters:
index- Ond plus the given index.- Returns:
- the default category for index i. (The default category should ALWAYS be obtained by calling this method.)
-
addMissingData
Adds missing data values to cases in accordance with probabilities specified in a double array which has as many elements as there are columns in the input dataset. Hence, if the first element of the array of probabilities is alpha, then the first column will contain a -99 (or other missing value code) in a given case with probability alpha. This method will be useful in generating datasets which can be used to test algorithm that handle missing data and/or latent variables. Author: Frank Wimberly- Parameters:
inData- The data to which random missing data is to be added.probs- The probability of adding missing data to each column.- Returns:
- The new data sets with missing data added.
-
replaceMissingWithRandom
-
discreteSerializableInstance
A discrete data set used to construct some other serializable instances. -
containsMissingValue
- Returns:
- true iff the data sets contains a missing value.
-
containsMissingValue
-
logData
Log or unlog data -
standardizeData
-
standardizeData
public static double[] standardizeData(double[] data) -
standardizeData
public static cern.colt.list.DoubleArrayList standardizeData(cern.colt.list.DoubleArrayList data) -
standardizeData
-
standardizeData
-
center
public static double[] center(double[] d) -
centerData
-
center
-
discretize
-
createContinuousVariables
-
subMatrix
- Returns:
- the submatrix of m with variables in the order of the x variables.
-
subMatrix
- Returns:
- the submatrix of m with variables in the order of the x variables.
-
subMatrix
- Returns:
- the submatrix of m with variables in the order of the x variables.
-
subMatrix
public static Matrix subMatrix(ICovarianceMatrix m, Map<Node, Integer> indexMap, Node x, Node y, List<Node> z) - Returns:
- the submatrix of m with variables in the order of the x variables.
-
convertNumericalDiscreteToContinuous
public static DataSet convertNumericalDiscreteToContinuous(DataSet dataSet) throws NumberFormatException - Throws:
NumberFormatException
-
concatenate
-
concatenate
-
concatenate
-
concatenate
-
restrictToMeasured
-
means
-
means
Column major data. -
cov
-
mean
-
choleskySimulation
- Parameters:
cov- The variables and covariance matrix over the variables.- Returns:
- The simulated data.
-
getBootstrapSample
- Returns:
- a sample with replacement with the given sample size from the given dataset.
-
getResamplingDataset
- Returns:
- a sample without replacement with the given sample size from the given dataset.
-
getResamplingDataset
public static DataSet getResamplingDataset(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled without replacement.- Parameters:
data- original datasetsampleSize- number of data (row)randomGenerator- random number generator- Returns:
- dataset
-
getBootstrapSample
- Returns:
- a sample with replacement with the given sample size from the given dataset.
-
getBootstrapSample
public static DataSet getBootstrapSample(DataSet data, int sampleSize, org.apache.commons.math3.random.RandomGenerator randomGenerator) Get dataset sampled with replacement.- Parameters:
data- original datasetsampleSize- number of data (row)randomGenerator- random number generator- Returns:
- dataset
-
split
-
center
Subtracts the mean of each column from each datum that column. -
shuffleColumns
-
shuffleColumns2
-
covarianceNonparanormalDrton
-
getNonparanormalTransformed
-
removeConstantColumns
-
getEss
Returns the equivalent sample size, assuming all units are equally correlated and all unit variances are equal.
-