Class BlocksUtil

java.lang.Object
edu.cmu.tetrad.search.blocks.BlocksUtil

public final class BlocksUtil extends Object
Utility class for handling operations related to blocks, such as creating block variables, canonicalizing blocks, ensuring valid indices, and applying various cluster policies. This class includes methods to manipulate and process blocks and their corresponding data representations within a dataset.
  • Method Details

    • makeBlockVariables

      public static List<Node> makeBlockVariables(List<List<Integer>> blocks, DataSet dataSet)
      Creates a list of block variables based on the provided list of blocks and the dataset. If a block contains a single index, the corresponding variable from the dataset is added to the result. For larger blocks, a new latent variable is created and added to the result.
      Parameters:
      blocks - a list of lists, where each inner list represents a block of indices
      dataSet - the dataset associated with the specified blocks, providing the variables
      Returns:
      a list of Node objects representing the block variables, either existing or newly created
    • canonicalizeBlocks

      public static List<List<Integer>> canonicalizeBlocks(List<List<Integer>> blocks)
      Canonicalizes a list of blocks by removing null or empty blocks, sorting the contents of each block, and ensuring the resulting blocks are unique. The returned list maintains the order of the first occurrence of each unique block.
      Parameters:
      blocks - a list of lists, where each inner list represents a block of indices to canonicalize
      Returns:
      a list of canonicalized blocks that are non-empty, sorted internally, and unique in order
    • validateBlocks

      public static void validateBlocks(List<List<Integer>> blocks, DataSet data)
      Validates the provided list of blocks to ensure that all indices within each block are non-negative, within the range of columns in the given dataset, and not null. Throws an IllegalArgumentException if any of these conditions are violated.
      Parameters:
      blocks - a list of lists, where each inner list represents a block of indices to validate
      data - the dataset providing the number of columns for range validation
    • toSpec

      public static BlockSpec toSpec(List<List<Integer>> blocks, DataSet dataSet)
      Converts a list of block indices and a dataset into a BlockSpec object, ensuring the blocks are canonicalized and generating the appropriate block variables.
      Parameters:
      blocks - a list of lists, where each inner list represents a block of indices
      dataSet - the dataset associated with the blocks
      Returns:
      a BlockSpec object containing the dataset, canonicalized blocks, and block variables
    • toSpec

      public static BlockSpec toSpec(List<List<Integer>> blocks, List<Integer> ranks, DataSet dataSet)
      Converts a list of blocks, ranks, and a dataset into a BlockSpec object. The blocks are canonicalized to ensure uniformity, and block variables are generated based on the canonicalized blocks and dataset.
      Parameters:
      blocks - a list of lists, where each inner list represents a block of indices
      ranks - a list of integers representing the ranks associated with the blocks
      dataSet - the dataset associated with the blocks, providing the variables for block creation
      Returns:
      a BlockSpec object containing the dataset, canonicalized blocks, block variables, and ranks
    • expandLatents

      public static List<Node> expandLatents(BlockSpec spec)
      Expand ranks -> per-latent variables named Lk-1..Lk-r.
      Parameters:
      spec - the BlockSpec object containing the block variables to expand
      Returns:
      the expanded list of Node objects
    • makeDisjointBySize

      public static List<List<Integer>> makeDisjointBySize(List<List<Integer>> blocks)
      Creates a list of disjoint blocks from the provided list of blocks, prioritizing larger blocks first. Each block is processed to ensure no overlapping indices, and elements within processed blocks are sorted. The resulting list is unmodifiable and contains unique, disjoint, and sorted blocks.
      Parameters:
      blocks - a list of lists, where each inner list represents a block of indices to be made disjoint
      Returns:
      a list of disjoint blocks, where each block is a sorted and unmodifiable list of indices
    • makeDisjointSpec

      public static BlockSpec makeDisjointSpec(DataSet ds, List<List<Integer>> blocks)
      Constructs a BlockSpec object using the provided DataSet and block definitions, ensuring that the blocks are made disjoint by prioritizing larger blocks first. The resulting BlockSpec includes the dataset, the disjoint blocks, and associated block variables.
      Parameters:
      ds - the dataset associated with the blocks
      blocks - a list of lists, where each inner list represents a block of indices
      Returns:
      a BlockSpec object containing the dataset, disjoint blocks, and block variables
    • applySingleClusterPolicy

      public static BlockSpec applySingleClusterPolicy(BlockSpec blockSpec, SingleClusterPolicy policy, double alpha)
      Applies a single-cluster policy to the provided BlockSpec. Depending on the specified policy, the method modifies the blocks, ranks, and variables in the BlockSpec and returns a new BlockSpec object.
      Parameters:
      blockSpec - the BlockSpec object containing the current block configuration, ranks, and dataset
      policy - the SingleClusterPolicy to apply, which determines how unused columns or variables are handled (e.g., INCLUDE, EXCLUDE, NOISE_VAR)
      alpha - a double value representing a parameter used in the computation of ranks
      Returns:
      a new BlockSpec object that reflects the changes made according to the specified policy
    • giveGoodLatentNames

      public static BlockSpec giveGoodLatentNames(BlockSpec spec, Map<String,List<String>> trueClusters, BlocksUtil.NamingMode mode)
      Assigns meaningful names to latent variables in the provided BlockSpec object based on the given true clusters and the specified naming mode. This helps in creating more interpretable and user-friendly block specifications.
      Parameters:
      spec - the BlockSpec object containing the initial latent variable definitions
      trueClusters - a map where keys represent cluster names and values are lists of variable names associated with each cluster
      mode - the NamingMode specifying how the latent variables should be named
      Returns:
      a BlockSpec object with updated latent variable names based on the true clusters and naming mode