edu.cmu.tetrad.search.FastIca

public class FastIca extends Object

A Java implementation of FastIca following the R package fastICA. The only difference (I believe) is that the R package can handle complex numbers, whereas this implementation cannot.

Performance. The R version scales up much better than this one does, the main reason for which is that the calculation of the initial covariance matrix (1/n) X'X is so much faster.

The documention of the R version is as follows, all of which is true of this translation (so far as I know) except for its being in R and its allowing complex values:

Description:

This is an R and C code implementation of the FastICA algorithm of Aapo Hyvarinen et al. (URL: http://www.cis.hut.fi/aapo/) to perform Independent Component Analysis (ICA) and Projection Pursuit.

Usage:

fastICA(X, n.comp, alg.typ = c("parallel","deflation"), fun = c("logcosh","exp"), alpha = 1.0, method = c("R","C"), row.norm = FALSE, maxit = 200, tol = 1e-04, verbose = FALSE, w.init = NULL)

Arguments:

X: a data matrix with n rows representing observations and p columns representing variables.

n.comp: number of components to be extracted

alg.typ: if 'alg.typ == "parallel"' the components are extracted simultaneously (the default). if 'alg.typ == "deflation"' the components are extracted one at a time.

fun: the functional form of the G function used in the approximation to neg-entropy (see details)

alpha: constant in range [1, 2] used in approximation to neg-entropy when 'fun == "logcosh"'

method: if 'method == "R"' then computations are done exclusively in R (default). The code allows the interested R user to see exactly what the algorithm does. if 'method == "C"' then C code is used to perform most of the computations, which makes the algorithm run faster. During compilation the C code is linked to an optimized BLAS library if present, otherwise stand-alone BLAS routines are compiled.

row.norm: a logical value indicating whether rows of the data matrix 'X' should be standardized beforehand.

maxit: maximum number of iterations to perform

tol: a positive scalar giving the tolerance at which the un-mixing The data matrix X is considered to be a linear combination of non-Gaussian (independent) components i.e. X = SA where columns of S contain the independent components and A is a linear mixing matrix. In short ICA attempts to `un-mix' the data by estimating an un-mixing matrix W where XW = S.

Under this generative model the measured `signals' in X will tend to be `more Gaussian' than the source components (in S) due to the Central Limit Theorem. Thus, in order to extract the independent components/sources we search for an un-mixing matrix W that maximizes the non-gaussianity of the sources.

In FastICA, non-gaussianity is measured using approximations to neg-entropy (J) which are more robust than kurtosis based measures and fast to compute.

The approximation takes the form

J(y)=[E{G(y)}-E{G(v)}]^2 where v is a N(0,1) r.v.

The following choices of G are included as options G(u)=frac{1}{alpha} log cosh (alpha u) and G(u)=-exp(frac{-u^2}{2})

Algorithm*

First, the data is centered by subtracting the mean of each column of the data matrix X.

The data matrix is then `whitened' by projecting the data onto it's principle component directions i.e. X -> XK where K is a pre-whitening matrix. The number of components can be specified by the user.

The ICA algorithm then estimates a matrix W s.t XKW = S . W is chosen to maximize the neg-entropy approximation under the constraints that W is an orthonormal matrix. This constraint ensures that the estimated components are uncorrelated. The algorithm is based on a fixed-point iteration scheme for maximizing the neg-entropy.

Projection Pursuit*

In the absence of a generative model for the data the algorithm can be used to find the projection pursuit directions. Projection pursuit is a technique for finding `interesting' directions in multi-dimensional datasets. These projections and are useful for visualizing the dataset and in density estimation and regression. Interesting directions are those which show the least Gaussian distribution, which is what the FastICA algorithm does.

Author(s):

J L Marchini and C Heaton

References:

A. Hyvarinen and E. Oja (2000) Independent Component Analysis: Algorithms and Applications, _Neural Networks_, *13(4-5)*:411-430

Note: This code is currently broken; please do not use it until it's fixed. 11/24/2015> 0

Author:: Joseph Ramsey (of the translation, that is)

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

FastIca.IcaResult

A list containing the following components
Field Summary

Fields

Modifier and Type

Field

Description

static int

DEFLATION

The algorithm type where the components are extracted one at a time.

static int

EXP

The other function type that can be used to approximate negative entropy.

static int

LOGCOSH

One of the function types that can be used to approximate negative entropy.

static int

PARALLEL

The algorithm type where all components are extracted simultaneously.
Constructor Summary

Constructors

Constructor

Description

FastIca(Matrix X, int numComponents)

Constructs an instance of the Fast ICA algorithm, taking as arguments the two arguments that cannot be defaulted: the data matrix itself and the number of components to be extracted.
Method Summary

Modifier and Type

Method

Description

FastIca.IcaResult

findComponents()

Runs the Fast ICA algorithm (following the R version) and returns the list of result items that the R version returns.

double

getAlpha()

Constant in range [1, 2] used in approximation to neg-entropy when 'fun == "logcosh"'

int

getFunction()

The function type to be used, either LOGCOSH or EXP.

int

getMaxIterations()

Maximum number of iterations to perform.

double

getTolerance()

A positive scalar giving the tolerance at which the un-mixing matrix is considered to have converged.

Matrix

getWInit()

Initial un-mixing matrix of dimension (n.comp,n.comp).

boolean

isVerbose()

A logical value indicating the level of output as the algorithm runs.

void

setAlgorithmType(int algorithmType)

If algorithmType == PARALLEL the components are extracted simultaneously (the default).

void

setAlpha(double alpha)

Constant in range [1, 2] used in approximation to neg-entropy when 'fun == "logcosh"'

void

setFunction(int function)

The function type to be used, either LOGCOSH or EXP.

void

setMaxIterations(int maxIterations)

Maximum number of iterations to perform.

void

setRowNorm(boolean colNorm)

A logical value indicating whether rows of the data matrix 'X' should be standardized beforehand.

void

setTolerance(double tolerance)

A positive scalar giving the tolerance at which the un-mixing matrix is considered to have converged.

void

setVerbose(boolean verbose)

A logical value indicating the level of output as the algorithm runs.

void

setWInit(Matrix wInit)

Initial un-mixing matrix of dimension (n.comp,n.comp).

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- PARALLEL
  
  public static int PARALLEL
  
  The algorithm type where all components are extracted simultaneously.
- DEFLATION
  
  public static int DEFLATION
  
  The algorithm type where the components are extracted one at a time.
- LOGCOSH
  
  public static int LOGCOSH
  
  One of the function types that can be used to approximate negative entropy.
- EXP
  
  public static int EXP
  
  The other function type that can be used to approximate negative entropy.
Constructor Details
- FastIca
  
  public FastIca(Matrix X, int numComponents)
  
  Constructs an instance of the Fast ICA algorithm, taking as arguments the two arguments that cannot be defaulted: the data matrix itself and the number of components to be extracted.
  
  Parameters:
  
  X - A 2D matrix, rows being cases, columns being variables. It is assumed that there are no missing values.
Method Details
- setAlgorithmType
  
  public void setAlgorithmType(int algorithmType)
  
  If algorithmType == PARALLEL the components are extracted simultaneously (the default). if algorithmType == DEFLATION the components are extracted one at a time.
- getFunction
  
  public int getFunction()
  
  The function type to be used, either LOGCOSH or EXP.
- setFunction
  
  public void setFunction(int function)
  
  The function type to be used, either LOGCOSH or EXP.
- getAlpha
  
  public double getAlpha()
  
  Constant in range [1, 2] used in approximation to neg-entropy when 'fun == "logcosh"'
- setAlpha
  
  public void setAlpha(double alpha)
  
  Constant in range [1, 2] used in approximation to neg-entropy when 'fun == "logcosh"'
- setRowNorm
  
  public void setRowNorm(boolean colNorm)
  
  A logical value indicating whether rows of the data matrix 'X' should be standardized beforehand.
- getMaxIterations
  
  public int getMaxIterations()
  
  Maximum number of iterations to perform.
- setMaxIterations
  
  public void setMaxIterations(int maxIterations)
  
  Maximum number of iterations to perform.
- getTolerance
  
  public double getTolerance()
  
  A positive scalar giving the tolerance at which the un-mixing matrix is considered to have converged.
- setTolerance
  
  public void setTolerance(double tolerance)
  
  A positive scalar giving the tolerance at which the un-mixing matrix is considered to have converged.
- isVerbose
  
  public boolean isVerbose()
  
  A logical value indicating the level of output as the algorithm runs.
- setVerbose
  
  public void setVerbose(boolean verbose)
  
  A logical value indicating the level of output as the algorithm runs.
- getWInit
  
  public Matrix getWInit()
  
  Initial un-mixing matrix of dimension (n.comp,n.comp). If NULL (default) then a matrix of normal r.v.'s is used.
- setWInit
  
  public void setWInit(Matrix wInit)
  
  Initial un-mixing matrix of dimension (n.comp,n.comp). If NULL (default) then a matrix of normal r.v.'s is used.
- findComponents
  
  public FastIca.IcaResult findComponents()
  
  Runs the Fast ICA algorithm (following the R version) and returns the list of result items that the R version returns.
  
  Returns:
  
  this list, as an FastIca.IcaResult object.

Class FastIca

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

PARALLEL

DEFLATION

LOGCOSH

EXP

Constructor Details

FastIca

Method Details

setAlgorithmType

getFunction

setFunction

getAlpha

setAlpha

setRowNorm

getMaxIterations

setMaxIterations

getTolerance

setTolerance

isVerbose

setVerbose

getWInit

setWInit

findComponents