Class Embedding

java.lang.Object
edu.cmu.tetrad.search.utils.Embedding

public class Embedding extends Object
The Embedding class provides utilities for transforming datasets into embedded representations through basis expansions and one-hot encoding. This process is commonly used in preprocessing steps for machine learning or statistical analysis, enabling enhanced variable representations.
Author:
josephramsey, bandrews
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static final record 
    Represents the embedded data result, holding the original dataset, the transformed embedded dataset, and a mapping between the indices of original variables and their corresponding transformed variables.
  • Method Summary

    Modifier and Type
    Method
    Description
    static @NotNull Embedding.EmbeddedData
    getEmbeddedData(DataSet dataSet, int truncationLimit, int basisType, double basisScale)
    Computes the embedded data representation based on the provided dataset and parameters.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • getEmbeddedData

      @NotNull public static @NotNull Embedding.EmbeddedData getEmbeddedData(DataSet dataSet, int truncationLimit, int basisType, double basisScale)
      Computes the embedded data representation based on the provided dataset and parameters.
      Parameters:
      dataSet - The original dataset to be embedded; must not be null.
      truncationLimit - The maximum number of basis expansions for continuous variables; must be a positive integer.
      basisType - The type of basis function to use for continuous variable expansions. The function types are as follows:
      • 0 = `g(x) = x^index [Polynomial basis]
      • 1 = `g(x) = hermite1(index, x) [Probabilist's Hermite polynomial]
      • 2 = `g(x) = legendre(index, x) [Legendre polynomial]
      • 3 = `g(x) = chebyshev(index, x) [Chebyshev polynomial]
      basisScale - The scaling factor for data transformation. Set to 0 for standardization, positive for scaling, and -1 to skip scaling.
      Returns:
      An instance of EmbeddedData, containing the original dataset, the embedded dataset, and a mapping from original variable indices to their respective transformed indices in the embedded dataset.
      Throws:
      IllegalArgumentException - If the dataset is null, the truncation limit is less than 1, or the basis scale parameter is invalid.