Package edu.cmu.tetrad.search.utils
Class Embedding
java.lang.Object
edu.cmu.tetrad.search.utils.Embedding
The
Embedding class provides utilities for transforming datasets into embedded representations through basis
expansions and one-hot encoding. This process is commonly used in preprocessing steps for machine learning or
statistical analysis, enabling enhanced variable representations.- Author:
- josephramsey, bandrews
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final recordRepresents the embedded data result, holding the original dataset, the transformed embedded dataset, and a mapping between the indices of original variables and their corresponding transformed variables. -
Method Summary
Modifier and TypeMethodDescriptionstatic @NotNull Embedding.EmbeddedDatagetEmbeddedData(DataSet dataSet, int truncationLimit, int basisType, double basisScale) Computes the embedded data representation based on the provided dataset and parameters.
-
Method Details
-
getEmbeddedData
@NotNull public static @NotNull Embedding.EmbeddedData getEmbeddedData(DataSet dataSet, int truncationLimit, int basisType, double basisScale) Computes the embedded data representation based on the provided dataset and parameters.- Parameters:
dataSet- The original dataset to be embedded; must not be null.truncationLimit- The maximum number of basis expansions for continuous variables; must be a positive integer.basisType- The type of basis function to use for continuous variable expansions. The function types are as follows:- 0 = `g(x) = x^index [Polynomial basis]
- 1 = `g(x) = hermite1(index, x) [Probabilist's Hermite polynomial]
- 2 = `g(x) = legendre(index, x) [Legendre polynomial]
- 3 = `g(x) = chebyshev(index, x) [Chebyshev polynomial]
basisScale- The scaling factor for data transformation. Set to 0 for standardization, positive for scaling, and -1 to skip scaling.- Returns:
- An instance of
EmbeddedData, containing the original dataset, the embedded dataset, and a mapping from original variable indices to their respective transformed indices in the embedded dataset. - Throws:
IllegalArgumentException- If the dataset is null, the truncation limit is less than 1, or the basis scale parameter is invalid.
-