Class SimpleDataLoader

java.lang.Object
edu.cmu.tetrad.data.SimpleDataLoader

public class SimpleDataLoader extends Object
  • Constructor Details

    • SimpleDataLoader

      public SimpleDataLoader()
  • Method Details

    • loadContinuousData

      @NotNull public static @NotNull DataSet loadContinuousData(File file, String commentMarker, char quoteCharacter, String missingValueMarker, boolean hasHeader, edu.pitt.dbmi.data.reader.Delimiter delimiter, boolean excludeFirstColumn) throws IOException
      Loads a continuous dataset from a file.
      Parameters:
      file - The text file to load the data from.
      commentMarker - The comment marker as a string--e.g., "//".
      quoteCharacter - The quote character, e.g., '\"'.
      missingValueMarker - The missing value marker as a string--e.g., "NA".
      hasHeader - True if the first row of the data contains variable names.
      delimiter - One of the options in the Delimiter enum--e.g., Delimiter.TAB.
      excludeFirstColumn - If the first column should be excluded from the data.
      Returns:
      The loaded DataSet.
      Throws:
      IOException - If an error occurred in reading the file.
    • loadDiscreteData

      @NotNull public static @NotNull DataSet loadDiscreteData(File file, String commentMarker, char quoteCharacter, String missingValueMarker, boolean hasHeader, edu.pitt.dbmi.data.reader.Delimiter delimiter, boolean excludeFirstColumn) throws IOException
      Loads a discrete dataset from a file.
      Parameters:
      file - The text file to load the data from.
      commentMarker - The comment marker as a string--e.g., "//".
      quoteCharacter - The quote character, e.g., '\"'.
      missingValueMarker - The missing value marker as a string--e.g., "NA".
      hasHeader - True if the first row of the data contains variable names.
      delimiter - One of the options in the Delimiter enum--e.g., Delimiter.TAB.
      excludeFirstColumn - If the first columns should be excluded from the data.
      Returns:
      The loaded DataSet.
      Throws:
      IOException - If an error occurred in reading the file.
    • loadMixedData

      @NotNull public static @NotNull DataSet loadMixedData(File file, String commentMarker, char quoteCharacter, String missingValueMarker, boolean hasHeader, int maxNumCategories, edu.pitt.dbmi.data.reader.Delimiter delimiter, boolean excludeFirstColumn) throws IOException
      Loads a mixed dataset from a file.
      Parameters:
      file - The text file to load the data from.
      commentMarker - The comment marker as a string--e.g., "//".
      quoteCharacter - The quote character, e.g., '\"'.
      missingValueMarker - The missing value marker as a string--e.g., "NA".
      hasHeader - True if the first row of the data contains variable names.
      maxNumCategories - The maximum number of distinct entries in a columns alloed in order for the column to be parsed as discrete.
      delimiter - One of the options in the Delimiter enum--e.g., Delimiter.TAB.
      excludeFirstColumn - If the first columns should be excluded from the data set.
      Returns:
      The loaded DataSet.
      Throws:
      IOException - If an error occurred in reading the file.
    • loadCovarianceMatrix

      public static ICovarianceMatrix loadCovarianceMatrix(char[] chars, String commentMarker, DelimiterType delimiterType, char quoteChar, String missingValueMarker)
      Parses a covariance matrix from a char[] array. The format is as follows.
       /covariance
       100
       X1   X2   X3   X4
       1.4
       3.2  2.3
       2.5  3.2  5.3
       3.2  2.5  3.2  4.2
       
       CovarianceMatrix dataSet = DataLoader.loadCovMatrix(
                                 new FileReader(file), " \t", "//");
       
      The initial "/covariance" is optional.
    • loadCovarianceMatrix

      public static ICovarianceMatrix loadCovarianceMatrix(File file, String commentMarker, DelimiterType delimiter, char quoteCharacter, String missingValueMarker) throws IOException
      Parses the given files for a tabular data set, returning a RectangularDataSet if successful.
      Parameters:
      file - The text file to load the data from.
      commentMarker - The comment marker as a string--e.g., "//".
      delimiter - One of the options in the Delimiter enum--e.g., Delimiter.TAB.
      quoteCharacter - The quote character, e.g., '\"'.
      missingValueMarker - The missing value marker as a string--e.g., "NA".
      Throws:
      IOException - if the file cannot be read.
    • getDiscreteDataSet

      public static DataSet getDiscreteDataSet(DataModel dataSet)
      Returns the datamodel case to DataSet if it is discrete.
    • getContinuousDataSet

      public static DataSet getContinuousDataSet(DataModel dataSet)
      Returns the datamodel case to DataSet if it is continuous.
    • getMixedDataSet

      public static DataSet getMixedDataSet(DataModel dataSet)
      Returns the datamodel case to DataSet if it is mixed.
    • getCovarianceMatrix

      public static ICovarianceMatrix getCovarianceMatrix(DataModel dataModel, boolean precomputeCovariances)
      Returns the model cast to ICovarianceMatrix if already a covariance matric, or else returns the covariance matrix for a dataset.
    • getCovarianceMatrix

      @NotNull public static @NotNull ICovarianceMatrix getCovarianceMatrix(DataSet dataSet, boolean precomputeCovariances)
    • getCorrelationMatrix

      @NotNull public static @NotNull ICovarianceMatrix getCorrelationMatrix(DataSet dataSet)
    • loadKnowledge

      public static Knowledge loadKnowledge(File file, DelimiterType delimiter, String commentMarker) throws IOException
      Loads knowledge from a file. Assumes knowledge is the only thing in the file. No jokes please. :)
      Parameters:
      file - The text file to load the data from.
      delimiter - One of the options in the Delimiter enum--e.g., Delimiter.TAB.
      commentMarker - The comment marker as a string--e.g., "//".
      Throws:
      IOException