Class StatUtils

java.lang.Object
edu.cmu.tetrad.util.StatUtils

public final class StatUtils extends Object
Contains a number of basic statistical functions. Most methods are overloaded for either long or double arrays. NOTE: Some methods in this class have been adapted from class DStat written by Michael Fanelli, and the routines have been included here by permission. The methods which were adapted are:
  • gamma
  • internalGamma
  • beta
  • igamma
  • erf
  • poisson
  • chidist
  • contTable1
These methods are protected under copyright by the author. Here is the text of his copyright notice for DSTAT.java: "Copyright 1997 by Michael Fanelli. All Rights Reserved. Unlimited use of this beta code granted for non-commercial use only subject to the the expiration date. Commercial (for profit) use requires written permission."
Version:
$Id: $Id
Author:
josephramsey
  • Method Summary

    Modifier and Type
    Method
    Description
    static double
    averageDeviation(double[] array)
    averageDeviation.
    static double
    averageDeviation(double[] array, int N)
    averageDeviation.
    static double
    averageDeviation(long[] array)
    averageDeviation.
    static double
    averageDeviation(long[] array, int N)
    averageDeviation.
    static double
    basisFunctionValue(int type, int index, double x)
    Performs a calculation that involves repeatedly multiplying an initial value of `1.0` by the product of `0.95` and a given parameter `x`, iterating `index` times.
    static double
    beta(double x1, double x2)
    Calculates the value of beta for doubles
    static double
    calculateCentralMoment(double[] data, int i)
    Calculates the central moment of a dataset for a given order.
    static double
    calculateCumulant(double[] data, int i)
    Calculates the specified cumulant of a dataset based on the given order.
    static double
    calculateMoment(double[] data, int i)
    Calculates the i-th moment of the given data array.
    static double
    chebyshev(int index, double x)
    Computes the value of the Chebyshev polynomial of a given degree at a specified point x.
    static double
    chidist(double x, int degreesOfFreedom)
    Calculates the one-tail probability of the Chi-squared distribution for doubles
    static org.ejml.simple.SimpleMatrix
    chol(org.ejml.simple.SimpleMatrix A)
    Computes the Cholesky decomposition of the given matrix and returns its lower triangular matrix.
    static org.ejml.simple.SimpleMatrix
    chooseMatrix(org.ejml.simple.SimpleMatrix m, double lambda)
    Regularizes the diagonal of the given matrix by adding a scaled identity matrix.
    static short
    compressedCorrelation.
    static double
    correlation(double[] array1, double[] array2)
    correlation.
    static double
    correlation(double[] array1, double[] array2, int N)
    correlation.
    static double
    correlation(long[] array1, long[] array2)
    correlation.
    static double
    correlation(long[] array1, long[] array2, int N)
    correlation.
    static double
    correlation(Vector data1, Vector data2)
    correlation.
    static double[]
    cov(double[] x, double[] y, double[] condition, double threshold, double direction)
    cov.
    static double
    covariance(double[] array1, double[] array2)
    covariance.
    static double
    covariance(double[] array1, double[] array2, int N)
    covariance.
    static double
    covariance(long[] array1, long[] array2)
    covariance.
    static double
    covariance(long[] array1, long[] array2, int N)
    covariance.
    static double[][]
    covMatrix(double[] x, double[] y, double[][] z, double[] condition, double threshold, double direction)
    covMatrix.
    static int
    dieToss(int n)
    dieToss.
    static double[]
    E(double[] x, double[] y, double[] condition, double threshold, double direction)
    E.
    static double
    entropy(int numBins, double[] _f)
    Computes the entropy of a distribution based on the provided data values and number of bins.
    static double
    erf(double x)
    Calculates the error function for a double
    static org.ejml.simple.SimpleMatrix
    extractSubMatrix(org.ejml.simple.SimpleMatrix matrix, int[] rows, int[] cols)
    Extracts a submatrix from the specified matrix by selecting the rows and columns indicated by the provided indices.
    static org.ejml.simple.SimpleMatrix
    extractSubMatrix(org.ejml.simple.SimpleMatrix matrix, int rowStart, int rowEnd, int colStart, int colEnd)
    Extracts a submatrix from the given matrix within the specified row and column bounds.
    static long
    factorial(int c)
    factorial.
    static int
    fdr(double alpha, List<Double> pValues)
    fdr.
    static int
    fdr(double alpha, List<Double> pValues, boolean negativelyCorrelated, boolean pSorted)
    fdr.
    static double
    fdrCutoff(double alpha, List<Double> pValues, boolean negativelyCorrelated)
    fdrCutoff.
    static double
    fdrCutoff(double alpha, List<Double> pValues, boolean negativelyCorrelated, boolean pSorted)
    Calculates the cutoff value for p-values using the FDR method.
    static double
    fdrCutoff(double alpha, List<Double> pValues, int[] _k, boolean negativelyCorrelated, boolean pSorted)
    fdrCutoff.
    static double
    fdrQ(List<Double> pValues, int k)
    fdrQ.
    static double
    gamma(double z)
    GAMMA FUNCTION (From DStat, used by permission).
    static double
    getChiSquareP(double dof, double chisq)
    Calculates the p-value for the given chi-square statistic and degrees of freedom.
    static double[]
    getRanks(double[] arr)
    getRanks.
    static List<Integer>
    getRows(double[] x, double[] condition, double threshold, double direction)
    getRows.
    static List<Integer>
    getRows(double[] x, double threshold, double direction)
    getRows.
    static double
    getZForAlpha(double alpha)
    getZForAlpha.
    static double
    hermite1(int index, double x)
    Computes the probabilist's Hermite polynomial of the given index at the specified point.
    static double
    hermite2(int index, double x)
    Computes the (physicis's) Hermite polynomial of a given index and value.
    static double
    igamma(double a, double x)
    Calculates the incomplete gamma function for two doubles
    static double
    kendallsTau(double[] x, double[] y)
    kendallsTau.
    static double
    kurtosis(double[] array)
    kurtosis.
    static double
    kurtosis(double[] array, int N)
    kurtosis.
    static double
    kurtosis(long[] array)
    kurtosis.
    static double
    kurtosis(long[] array, int N)
    kurtosis.
    static double
    legendre(int index, double x)
    Computes the value of the Legendre polynomial of a given degree at a specified point x.
    static double
    Computes the logarithm of the hyperbolic cosine of an exponential value.
    static double
    logCoshScore(double[] _f)
    Computes the log-cosh score for a given array of data.
    static double
    logsum.
    static double
    max(double[] array)
    max.
    static double
    max(double[] array, int N)
    max.
    static double
    max(long[] array)
    max.
    static double
    max(long[] array, int N)
    max.
    static double
    maxEntApprox(double[] x)
    Calculates the maximum entropy approximation of the given data array.
    static double
    mean(double[] array)
    mean.
    static double
    mean(double[] array, int N)
    mean.
    static double
    mean(long[] array)
    mean.
    static double
    mean(long[] array, int N)
    mean.
    static double
    mean(Vector data, int N)
    mean.
    static double
    meanAbsolute(double[] _f)
    Calculates the squared difference between the mean of the absolute values of a standardized dataset and the theoretical mean of the standard normal distribution.
    static double
    median(double[] elements)
    Computes the median value of a given array of doubles.
    static double
    median(org.ejml.simple.SimpleMatrix matrix)
    Calculates the median of all the elements in the given SimpleMatrix.
    static double
    min(double[] array)
    min.
    static double
    min(double[] array, int N)
    min.
    static double
    min(long[] array)
    min.
    static double
    min(long[] array, int N)
    min.
    static double
    mu(double[] array)
    mu.
    static double
    mu(double[] array, int N)
    mu.
    static double
    mu(long[] array)
    mu.
    static double
    mu(long[] array, int N)
    mu.
    static double
    muHat(double[] array)
    muHat.
    static double
    muHat(double[] array, int N)
    muHat.
    static double
    muHat(long[] array)
    muHat.
    static double
    muHat(long[] array, int N)
    muHat.
    static int
    N(double[] array)
    N.
    static int
    N(long[] array)
    N.
    static double
    partialCorrelation(Matrix submatrix, double lambda)
    Assumes that the given covariance matrix was extracted in such a way that the order of the variables (in either direction) is X, Y, Z1, ..., Zn, where the partial correlation one wants is correlation(X, Y | Z1,...,Zn).
    static double
    partialCorrelation(Matrix covariance, double lambda, int x, int y, int... z)
    partialCorrelation.
    static double
    partialCorrelationPrecisionMatrix(Matrix submatrix, double lambda)
    partialCorrelationPrecisionMatrix.
    static double
    Assumes that the given covariance matrix was extracted in such a way that the order of the variables (in either direction) is X, Y, Z1, ..., Zn, where the partial covariance one wants is covariance(X, Y | Z1,...,Zn).
    static double
    partialCovarianceWhittaker(Matrix covariance, int x, int y, int... z)
    partialCovarianceWhittaker.
    static double
    partialStandardDeviation(Matrix covariance, int x, int... z)
    partialStandardDeviation.
    static double
    partialVariance(Matrix covariance, int x, int... z)
    partialVariance.
    static double
    poisson(double k, double x, boolean cum)
    Calculates the Poisson Distribution for mean x and k events for doubles.
    static double
    pow()
    Calculates the average of 1000 random absolute values generated from a normal distribution with a mean of 0 and standard deviation of 1.
    static double
    quartile(double[] array, int quartileNumber)
    quartile.
    static double
    quartile(double[] array, int N, int quartileNumber)
    quartile.
    static double
    quartile(long[] array, int quartileNumber)
    quartile.
    static double
    quartile(long[] array, int N, int quartileNumber)
    quartile.
    static double
    range(double[] array)
    range.
    static double
    range(double[] array, int N)
    range.
    static double
    range(long[] array)
    range.
    static double
    range(long[] array, int N)
    range.
    static double
    rankCorrelation(double[] arr1, double[] arr2)
    rankCorrelation.
    static double[]
    removeNaN(double[] x1)
    removeNaN.
    static double
    sd(double[] array)
    sd.
    static double
    sd(double[] array, int N)
    sd.
    static double
    sd(long[] array)
    sd.
    static double
    sd(long[] array, int N)
    sd.
    static double
    skewness(double[] array)
    skewness.
    static double
    skewness(double[] array, int N)
    skewness.
    static double
    skewness(long[] array)
    skewness.
    static double
    skewness(long[] array, int N)
    skewness.
    static double
    sSquare(double[] array)
    sSquare.
    static double
    sSquare(double[] array, int N)
    sSquare.
    static double
    sSquare(long[] array)
    sSquare.
    static double
    sSquare(long[] array, int N)
    sSquare.
    static double
    ssx(double[] array)
    ssx.
    static double
    ssx(double[] array, int N)
    ssx.
    static double
    ssx(long[] array)
    ssx.
    static double
    ssx(long[] array, int N)
    ssx.
    static double[]
    standardizeData(double[] data)
    Standardizes the provided data array by subtracting the mean and scaling by the standard deviation.
    static Vector
    Standardizes the provided vector data by removing the mean and scaling to unit variance.
    static double
    standardizedFifthMoment(double[] array)
    standardizedFifthMoment.
    static double
    standardizedFifthMoment(double[] array, int N)
    standardizedFifthMoment.
    static double
    standardizedSixthMoment(double[] array)
    standardizedSixthMoment.
    static double
    standardizedSixthMoment(double[] array, int N)
    standardizedSixthMoment.
    static double
    sum(double[] x)
    sum.
    static double
    sxy(double[] array1, double[] array2)
    sxy.
    static double
    sxy(double[] array1, double[] array2, int N)
    sxy.
    static double
    sxy(long[] array1, long[] array2)
    sxy.
    static double
    sxy(long[] array1, long[] array2, int N)
    sxy.
    static double
    sxy(Vector data1, Vector data2, int N)
    sxy.
    static double
    varHat(double[] array)
    varHat.
    static double
    varHat(double[] array, int N)
    varHat.
    static double
    varHat(long[] array)
    varHat.
    static double
    varHat(long[] array, int N)
    varHat.
    static double
    variance(double[] array)
    variance.
    static double
    variance(double[] array, int N)
    variance.
    static double
    variance(long[] array)
    variance.
    static double
    variance(long[] array, int N)
    variance.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • mean

      public static double mean(long[] array)

      mean.

      Parameters:
      array - a long array.
      Returns:
      the mean of the values in this array.
    • mean

      public static double mean(double[] array)

      mean.

      Parameters:
      array - a double array.
      Returns:
      the mean of the values in this array.
    • mean

      public static double mean(long[] array, int N)

      mean.

      Parameters:
      array - a long array.
      N - the number of values of array which should be considered.
      Returns:
      the mean of the first N values in this array.
    • mean

      public static double mean(double[] array, int N)

      mean.

      Parameters:
      array - a double array.
      N - the number of values of array which should be considered.
      Returns:
      the mean of the first N values in this array.
    • mean

      public static double mean(Vector data, int N)

      mean.

      Parameters:
      data - a column vector.
      N - the number of values of array which should be considered.
      Returns:
      the mean of the first N values in this array.
    • median

      public static double median(org.ejml.simple.SimpleMatrix matrix)
      Calculates the median of all the elements in the given SimpleMatrix.
      Parameters:
      matrix - the input SimpleMatrix containing the elements for which the median is to be calculated
      Returns:
      the median value of the elements in the input matrix
    • median

      public static double median(double[] elements)
      Computes the median value of a given array of doubles. The method sorts the array and then calculates the median based on whether the number of elements in the array is odd or even.
      Parameters:
      elements - an array of double values for which the median is to be calculated. The input array will be sorted in-place.
      Returns:
      the median value of the array. If the array is empty, the behavior is undefined.
    • quartile

      public static double quartile(long[] array, int quartileNumber)

      quartile.

      Parameters:
      array - a long array.
      quartileNumber - 1, 2, or 3.
      Returns:
      the requested quartile of the values in this array.
    • quartile

      public static double quartile(double[] array, int quartileNumber)

      quartile.

      Parameters:
      array - a double array.
      quartileNumber - 1, 2, or 3.
      Returns:
      the requested quartile of the values in this array.
    • quartile

      public static double quartile(long[] array, int N, int quartileNumber)

      quartile.

      Parameters:
      array - a long array.
      N - the number of values of array which should be considered.
      quartileNumber - 1, 2, or 3.
      Returns:
      the requested quartile of the first N values in this array.
    • quartile

      public static double quartile(double[] array, int N, int quartileNumber)

      quartile.

      Parameters:
      array - a double array.
      N - the number of values of array which should be considered.
      quartileNumber - 1, 2, or 3.
      Returns:
      the requested quartile of the first N values in this array.
    • min

      public static double min(long[] array)

      min.

      Parameters:
      array - a long array.
      Returns:
      the minimum of the values in this array.
    • min

      public static double min(double[] array)

      min.

      Parameters:
      array - a double array.
      Returns:
      the minimum of the values in this array.
    • min

      public static double min(long[] array, int N)

      min.

      Parameters:
      array - a long array.
      N - the number of values of array which should be considered.
      Returns:
      the minimum of the first N values in this array.
    • min

      public static double min(double[] array, int N)

      min.

      Parameters:
      array - a double array.
      N - the number of values of array which should be considered.
      Returns:
      the minimum of the first N values in this array.
    • max

      public static double max(long[] array)

      max.

      Parameters:
      array - a long array.
      Returns:
      the maximum of the values in this array.
    • max

      public static double max(double[] array)

      max.

      Parameters:
      array - a double array.
      Returns:
      the maximum of the values in this array.
    • max

      public static double max(long[] array, int N)

      max.

      Parameters:
      array - a long array.
      N - the number of values of array which should be considered.
      Returns:
      the maximum of the first N values in this array.
    • max

      public static double max(double[] array, int N)

      max.

      Parameters:
      array - a double array.
      N - the number of values of array which should be considered.
      Returns:
      the maximum of the first N values in this array.
    • range

      public static double range(long[] array)

      range.

      Parameters:
      array - a long array.
      Returns:
      the range of the values in this array.
    • range

      public static double range(double[] array)

      range.

      Parameters:
      array - a double array.
      Returns:
      the range of the values in this array.
    • range

      public static double range(long[] array, int N)

      range.

      Parameters:
      array - a long array.
      N - the number of values of array which should be considered.
      Returns:
      the range of the first N values in this array.
    • range

      public static double range(double[] array, int N)

      range.

      Parameters:
      array - a double array.
      N - the number of values of array which should be considered.
      Returns:
      the range of the first N values in this array.
    • N

      public static int N(long[] array)

      N.

      Parameters:
      array - a long array.
      Returns:
      the length of this array.
    • N

      public static int N(double[] array)

      N.

      Parameters:
      array - a double array.
      Returns:
      the length of this array.
    • ssx

      public static double ssx(long[] array)

      ssx.

      Parameters:
      array - a long array.
      Returns:
      the sum of the squared differences from the mean in array.
    • ssx

      public static double ssx(double[] array)

      ssx.

      Parameters:
      array - a double array.
      Returns:
      the sum of the squared differences from the mean in array.
    • ssx

      public static double ssx(long[] array, int N)

      ssx.

      Parameters:
      array - a long array.
      N - the number of values of array which should be considered.
      Returns:
      the sum of the squared differences from the mean of the first N values in array.
    • ssx

      public static double ssx(double[] array, int N)

      ssx.

      Parameters:
      array - a double array.
      N - the number of values of array which should be considered.
      Returns:
      the sum of the squared differences from the mean of the first N values in array.
    • sxy

      public static double sxy(long[] array1, long[] array2)

      sxy.

      Parameters:
      array1 - a long array.
      array2 - a long array, same length as array1.
      Returns:
      the sum of the squared differences of the products from the products of the sample means for array1 and array2..
    • sxy

      public static double sxy(double[] array1, double[] array2)

      sxy.

      Parameters:
      array1 - a double array.
      array2 - a double array, same length as array1.
      Returns:
      the sum of the squared differences of the products from the products of the sample means for array1 and array2..
    • sxy

      public static double sxy(long[] array1, long[] array2, int N)

      sxy.

      Parameters:
      array1 - a long array.
      array2 - a long array.
      N - the number of values of array which should be considered.
      Returns:
      the sum of the squared differences of the products from the products of the sample means for the first N values in array1 and array2..
    • sxy

      public static double sxy(double[] array1, double[] array2, int N)

      sxy.

      Parameters:
      array1 - a double array.
      array2 - a double array.
      N - the number of values of array which should be considered.
      Returns:
      the sum of the squared differences of the products from the products of the sample means for the first N values in array1 and array2..
    • sxy

      public static double sxy(Vector data1, Vector data2, int N)

      sxy.

      Parameters:
      data1 - a column vector of doubles.
      data2 - a column vector of doubles.
      N - the number of values of array which should be considered.
      Returns:
      the sum of the squared differences of the products from the products of the sample means for the first N values in array1 and array2..
    • variance

      public static double variance(long[] array)

      variance.

      Parameters:
      array - a long array.
      Returns:
      the variance of the values in array.
    • variance

      public static double variance(double[] array)

      variance.

      Parameters:
      array - a double array.
      Returns:
      the variance of the values in array.
    • variance

      public static double variance(long[] array, int N)

      variance.

      Parameters:
      array - a long array.
      N - the number of values of array which should be considered.
      Returns:
      the variance of the first N values in array.
    • variance

      public static double variance(double[] array, int N)

      variance.

      Parameters:
      array - a double array.
      N - the number of values of array which should be considered.
      Returns:
      the variance of the first N values in array.
    • sd

      public static double sd(long[] array)

      sd.

      Parameters:
      array - a long array.
      Returns:
      the standard deviation of the values in array.
    • sd

      public static double sd(double[] array)

      sd.

      Parameters:
      array - a double array.
      Returns:
      the standard deviation of the values in array.
    • sd

      public static double sd(long[] array, int N)

      sd.

      Parameters:
      array - a long array.
      N - the number of values of array which should be considered.
      Returns:
      the standard deviation of the first N values in array.
    • sd

      public static double sd(double[] array, int N)

      sd.

      Parameters:
      array - a double array.
      N - the number of values of array which should be considered.
      Returns:
      the standard deviation of the first N values in array.
    • covariance

      public static double covariance(long[] array1, long[] array2)

      covariance.

      Parameters:
      array1 - a long array.
      array2 - a second long array (same length as array1).
      Returns:
      the covariance of the values in array.
    • covariance

      public static double covariance(double[] array1, double[] array2)

      covariance.

      Parameters:
      array1 - a double array.
      array2 - a second double array (same length as array1).
      Returns:
      the covariance of the values in array.
    • covariance

      public static double covariance(long[] array1, long[] array2, int N)

      covariance.

      Parameters:
      array1 - a long array.
      array2 - a second long array.
      N - the number of values to be considered in array1 and array2.
      Returns:
      the covariance of the first N values in array1 and array2.
    • covariance

      public static double covariance(double[] array1, double[] array2, int N)

      covariance.

      Parameters:
      array1 - a double array.
      array2 - a second double array (same length as array1).
      N - the number of values to be considered in array1 and array2.
      Returns:
      the covariance of the first N values in array1 and array2.
    • correlation

      public static double correlation(long[] array1, long[] array2)

      correlation.

      Parameters:
      array1 - a long array.
      array2 - a second long array (same length as array1).
      Returns:
      the Pearson's correlation of the values in array1 and array2.
    • correlation

      public static double correlation(double[] array1, double[] array2)

      correlation.

      Parameters:
      array1 - a double array.
      array2 - a second double array (same length as array1).
      Returns:
      the Pearson's correlation of the values in array1 and array2.
    • correlation

      public static double correlation(Vector data1, Vector data2)

      correlation.

      Parameters:
      data1 - a Vector object
      data2 - a Vector object
      Returns:
      a double
    • compressedCorrelation

      public static short compressedCorrelation(Vector data1, Vector data2)

      compressedCorrelation.

      Parameters:
      data1 - a Vector object
      data2 - a Vector object
      Returns:
      a short
    • correlation

      public static double correlation(long[] array1, long[] array2, int N)

      correlation.

      Parameters:
      array1 - a long array.
      array2 - a second long array.
      N - the number of values to be considered in array1 and array2.
      Returns:
      the Pearson's correlation of the first N values in array1 and array2.
    • correlation

      public static double correlation(double[] array1, double[] array2, int N)

      correlation.

      Parameters:
      array1 - a double array.
      array2 - a second double array.
      N - the number of values to be considered in array1 and array2.
      Returns:
      the Pearson correlation of the first N values in array1 and array2.
    • rankCorrelation

      public static double rankCorrelation(double[] arr1, double[] arr2)

      rankCorrelation.

      Parameters:
      arr1 - an array of objects
      arr2 - an array of objects
      Returns:
      a double
    • kendallsTau

      public static double kendallsTau(double[] x, double[] y)

      kendallsTau.

      Parameters:
      x - an array of objects
      y - an array of objects
      Returns:
      a double
    • getRanks

      public static double[] getRanks(double[] arr)

      getRanks.

      Parameters:
      arr - an array of objects
      Returns:
      an array of objects
    • sSquare

      public static double sSquare(long[] array)

      sSquare.

      Parameters:
      array - a long array.
      Returns:
      the unbaised estimate of the variance of the distribution of the values in array asuming the mean is unknown.
    • sSquare

      public static double sSquare(double[] array)

      sSquare.

      Parameters:
      array - a double array.
      Returns:
      the unbaised estimate of the variance of the distribution of the values in array asuming the mean is unknown.
    • sSquare

      public static double sSquare(long[] array, int N)

      sSquare.

      Parameters:
      array - a long array.
      N - the number of values to be considered in array.
      Returns:
      the unbaised estimate of the variance of the distribution of the first N values in array asuming the mean is unknown.
    • sSquare

      public static double sSquare(double[] array, int N)

      sSquare.

      Parameters:
      array - a double array.
      N - the number of values to be considered in array.
      Returns:
      the unbaised estimate of the variance of the distribution of the first N values in array asuming the mean is unknown.
    • varHat

      public static double varHat(long[] array)

      varHat.

      Parameters:
      array - a long array.
      Returns:
      the unbaised estimate of the variance of the distribution of the values in array asuming the mean is known.
    • varHat

      public static double varHat(double[] array)

      varHat.

      Parameters:
      array - a double array.
      Returns:
      the unbaised estimate of the variance of the distribution of the values in array asuming the mean is known.
    • varHat

      public static double varHat(long[] array, int N)

      varHat.

      Parameters:
      array - a long array.
      N - the number of values to be considered in array.
      Returns:
      the unbaised estimate of the variance of the distribution of the first N values in array asuming the mean is known.
    • varHat

      public static double varHat(double[] array, int N)

      varHat.

      Parameters:
      array - a double array.
      N - the number of values to be considered in array.
      Returns:
      the unbaised estimate of the variance of the distribution of the first N values in array asuming the mean is known.
    • mu

      public static double mu(long[] array)

      mu.

      Parameters:
      array - a long array.
      Returns:
      the unbaised estimate of the mean of the distribution of the values in array.
    • mu

      public static double mu(double[] array)

      mu.

      Parameters:
      array - a double array.
      Returns:
      the unbaised estimate of the mean of the distribution of the values in array.
    • mu

      public static double mu(long[] array, int N)

      mu.

      Parameters:
      array - a long array.
      N - the number of values to be considered in array.
      Returns:
      the unbaised estimate of the mean of the distribution of the first N values in array.
    • mu

      public static double mu(double[] array, int N)

      mu.

      Parameters:
      array - a double array.
      N - the number of values to be considered in array.
      Returns:
      the unbaised estimate of the mean of the distribution of the first N values in array.
    • muHat

      public static double muHat(long[] array)

      muHat.

      Parameters:
      array - a long array.
      Returns:
      the maximum likelihood estimate of the mean of the distribution of the values in array.
    • muHat

      public static double muHat(double[] array)

      muHat.

      Parameters:
      array - a double array.
      Returns:
      the maximum likelihood estimate of the mean of the distribution of the values in array.
    • muHat

      public static double muHat(long[] array, int N)

      muHat.

      Parameters:
      array - a long array.
      N - the number of values to be considered in array.
      Returns:
      the maximum likelihood estimate of the mean of the distribution of the first N values in array.
    • muHat

      public static double muHat(double[] array, int N)

      muHat.

      Parameters:
      array - a long array.
      N - the number of values to be considered in array.
      Returns:
      the maximum likelihood estimate of the mean of the distribution of the first N values in array.
    • averageDeviation

      public static double averageDeviation(long[] array)

      averageDeviation.

      Parameters:
      array - a long array.
      Returns:
      the average deviation of the values in array.
    • averageDeviation

      public static double averageDeviation(double[] array)

      averageDeviation.

      Parameters:
      array - a double array.
      Returns:
      the average deviation of the values in array.
    • averageDeviation

      public static double averageDeviation(long[] array, int N)

      averageDeviation.

      Parameters:
      array - a long array.
      N - the number of values to be considered in array.
      Returns:
      the average deviation of the first N values in array.
    • averageDeviation

      public static double averageDeviation(double[] array, int N)

      averageDeviation.

      Parameters:
      array - a double array.
      N - the number of values to be considered in array.
      Returns:
      the average deviation of the first N values in array.
    • calculateMoment

      public static double calculateMoment(double[] data, int i)
      Calculates the i-th moment of the given data array. The i-th moment is computed as the mean of each data value raised to the power of i.
      Parameters:
      data - the array of data values for which the moment is to be calculated
      i - the power to which each data value is raised
      Returns:
      the calculated i-th moment of the data
    • calculateCentralMoment

      public static double calculateCentralMoment(double[] data, int i)
      Calculates the central moment of a dataset for a given order. The central moment is derived by raising each deviation of the data points from the mean to the specified power and averaging the results.
      Parameters:
      data - the array of data points for which the central moment is to be calculated
      i - the order of the central moment to be calculated
      Returns:
      the calculated central moment of the dataset
    • calculateCumulant

      public static double calculateCumulant(double[] data, int i)
      Calculates the specified cumulant of a dataset based on the given order. Cumulants provide statistical descriptions of datasets and are related to moments.
      Parameters:
      data - the array of data points, representing the dataset for which the cumulant is to be calculated
      i - the order of the cumulant to calculate (e.g., 1 for mean, 2 for variance, 3 for skewness-related cumulant, etc.)
      Returns:
      the calculated cumulant of the specified order
      Throws:
      IllegalArgumentException - if the order of the cumulant (i) is greater than 5, as calculation for higher-order cumulants are not implemented
    • skewness

      public static double skewness(long[] array)

      skewness.

      Parameters:
      array - a long array.
      Returns:
      the skew of the values in array.
    • skewness

      public static double skewness(double[] array)

      skewness.

      Parameters:
      array - a double array.
      Returns:
      the skew of the values in array.
    • skewness

      public static double skewness(long[] array, int N)

      skewness.

      Parameters:
      array - a long array.
      N - the number of values of array which should be considered.
      Returns:
      the skew of the first N values in array.
    • skewness

      public static double skewness(double[] array, int N)

      skewness.

      Parameters:
      array - a double array.
      N - the number of values of array which should be considered.
      Returns:
      the skew of the first N values in array.
    • removeNaN

      public static double[] removeNaN(double[] x1)

      removeNaN.

      Parameters:
      x1 - an array of objects
      Returns:
      an array of objects
    • kurtosis

      public static double kurtosis(long[] array)

      kurtosis.

      Parameters:
      array - a long array.
      Returns:
      the kurtosis of the values in array.
    • kurtosis

      public static double kurtosis(double[] array)

      kurtosis.

      Parameters:
      array - a double array.
      Returns:
      the curtosis of the values in array.
    • kurtosis

      public static double kurtosis(long[] array, int N)

      kurtosis.

      Parameters:
      array - a long array.
      N - the number of values of array which should be considered.
      Returns:
      the curtosis of the first N values in array.
    • standardizedFifthMoment

      public static double standardizedFifthMoment(double[] array)

      standardizedFifthMoment.

      Parameters:
      array - an array of objects
      Returns:
      a double
    • standardizedFifthMoment

      public static double standardizedFifthMoment(double[] array, int N)

      standardizedFifthMoment.

      Parameters:
      array - an array of objects
      N - a int
      Returns:
      a double
    • standardizedSixthMoment

      public static double standardizedSixthMoment(double[] array)

      standardizedSixthMoment.

      Parameters:
      array - an array of objects
      Returns:
      a double
    • standardizedSixthMoment

      public static double standardizedSixthMoment(double[] array, int N)

      standardizedSixthMoment.

      Parameters:
      array - an array of objects
      N - a int
      Returns:
      a double
    • kurtosis

      public static double kurtosis(double[] array, int N)

      kurtosis.

      Parameters:
      array - a double array.
      N - the number of values of array which should be considered.
      Returns:
      the curtosis of the first N values in array.
    • gamma

      public static double gamma(double z)
      GAMMA FUNCTION (From DStat, used by permission).

      Calculates the value of gamma(double z) using Handbook of Mathematical Functions AMS 55 by Abromowitz page 256.

      Parameters:
      z - nonnegative double value.
      Returns:
      the gamma value of z.
    • beta

      public static double beta(double x1, double x2)
      Calculates the value of beta for doubles
      Parameters:
      x1 - the first double
      x2 - the second double.
      Returns:
      beta(x1, x2).
    • igamma

      public static double igamma(double a, double x)
      Calculates the incomplete gamma function for two doubles
      Parameters:
      a - first double.
      x - second double.
      Returns:
      incomplete gamma of (a, x).
    • erf

      public static double erf(double x)
      Calculates the error function for a double
      Parameters:
      x - argument.
      Returns:
      error function of this argument.
    • poisson

      public static double poisson(double k, double x, boolean cum)
      Calculates the Poisson Distribution for mean x and k events for doubles. If third parameter is boolean true, the cumulative Poisson function is returned.
      Parameters:
      k - # events
      x - mean
      cum - true if the cumulative Poisson is desired.
      Returns:
      the value of the Poisson (or cumPoisson) at x.
    • chidist

      public static double chidist(double x, int degreesOfFreedom)
      Calculates the one-tail probability of the Chi-squared distribution for doubles
      Parameters:
      x - a double
      degreesOfFreedom - a int
      Returns:
      value of Chi at x with the stated degrees of freedom.
    • dieToss

      public static int dieToss(int n)

      dieToss.

      Parameters:
      n - a int
      Returns:
      a int
    • fdrCutoff

      public static double fdrCutoff(double alpha, List<Double> pValues, boolean negativelyCorrelated, boolean pSorted)
      Calculates the cutoff value for p-values using the FDR method. Hypotheses with p-values less than or equal to this cutoff should be rejected according to the test.
      Parameters:
      alpha - The desired effective significance level.
      pValues - An list containing p-values to be tested in positions 0, 1, ..., n. (The rest of the array is ignored.) Note: This array will not be changed by this class. Its values are copied into a separate array before sorting.
      negativelyCorrelated - Whether the p-values in the array pValues are negatively correlated (true if yes, false if no). If they are uncorrelated, or positively correlated, a level of alpha is used; if they are not correlated, a level of alpha / SUM_i=1_n(1 / i) is used.
      pSorted - a boolean
      Returns:
      the FDR alpha, which is the first p-value sorted high to low to fall below a line from (1.0, level) to (0.0, 0.0). Hypotheses less than or equal to this p-value should be rejected.
    • fdrCutoff

      public static double fdrCutoff(double alpha, List<Double> pValues, boolean negativelyCorrelated)

      fdrCutoff.

      Parameters:
      alpha - a double
      pValues - a List object
      negativelyCorrelated - a boolean
      Returns:
      a double
    • fdrCutoff

      public static double fdrCutoff(double alpha, List<Double> pValues, int[] _k, boolean negativelyCorrelated, boolean pSorted)

      fdrCutoff.

      Parameters:
      alpha - a double
      pValues - a List object
      _k - an array of objects
      negativelyCorrelated - a boolean
      pSorted - a boolean
      Returns:
      a double
    • fdr

      public static int fdr(double alpha, List<Double> pValues)

      fdr.

      Parameters:
      alpha - a double
      pValues - a List object
      Returns:
      the index, >=, in the sorted list of p values of which all p values are rejected. It the index is -1, all p values are rejected.
    • fdr

      public static int fdr(double alpha, List<Double> pValues, boolean negativelyCorrelated, boolean pSorted)

      fdr.

      Parameters:
      alpha - a double
      pValues - a List object
      negativelyCorrelated - a boolean
      pSorted - a boolean
      Returns:
      a int
    • fdrQ

      public static double fdrQ(List<Double> pValues, int k)

      fdrQ.

      Parameters:
      pValues - a List object
      k - a int
      Returns:
      a double
    • partialCovarianceWhittaker

      public static double partialCovarianceWhittaker(Matrix submatrix)
      Assumes that the given covariance matrix was extracted in such a way that the order of the variables (in either direction) is X, Y, Z1, ..., Zn, where the partial covariance one wants is covariance(X, Y | Z1,...,Zn). This may be extracted using DataUtils.submatrix().
      Parameters:
      submatrix - a Matrix object
      Returns:
      the given partial covariance.
    • partialCovarianceWhittaker

      public static double partialCovarianceWhittaker(Matrix covariance, int x, int y, int... z)

      partialCovarianceWhittaker.

      Parameters:
      covariance - a Matrix object
      x - a int
      y - a int
      z - a int
      Returns:
      the partial covariance(x, y | z) where these represent the column/row indices of the desired variables in covariance
    • partialVariance

      public static double partialVariance(Matrix covariance, int x, int... z)

      partialVariance.

      Parameters:
      covariance - a Matrix object
      x - a int
      z - a int
      Returns:
      a double
    • partialStandardDeviation

      public static double partialStandardDeviation(Matrix covariance, int x, int... z)

      partialStandardDeviation.

      Parameters:
      covariance - a Matrix object
      x - a int
      z - a int
      Returns:
      a double
    • partialCorrelation

      public static double partialCorrelation(Matrix submatrix, double lambda) throws org.apache.commons.math3.linear.SingularMatrixException
      Assumes that the given covariance matrix was extracted in such a way that the order of the variables (in either direction) is X, Y, Z1, ..., Zn, where the partial correlation one wants is correlation(X, Y | Z1,...,Zn). This may be extracted using DataUtils.submatrix().
      Parameters:
      submatrix - a Matrix object
      lambda - Singularity lambda.
      Returns:
      the given partial correlation.
      Throws:
      org.apache.commons.math3.linear.SingularMatrixException - if any.
    • partialCorrelationPrecisionMatrix

      public static double partialCorrelationPrecisionMatrix(Matrix submatrix, double lambda) throws org.apache.commons.math3.linear.SingularMatrixException

      partialCorrelationPrecisionMatrix.

      Parameters:
      submatrix - a Matrix object
      lambda - Regularisation lamgda, 0 for no regularization.z
      Returns:
      a double
      Throws:
      org.apache.commons.math3.linear.SingularMatrixException - if any.
    • chooseMatrix

      public static org.ejml.simple.SimpleMatrix chooseMatrix(org.ejml.simple.SimpleMatrix m, double lambda)
      Regularizes the diagonal of the given matrix by adding a scaled identity matrix. The regularization is achieved by creating an identity matrix of the same dimensions as the original matrix and scaling it by the given lambda value, then adding it to the original matrix.
      Parameters:
      m - the matrix to be regularized
      lambda - the scaling factor for the identity matrix to be added to the diagonal
      Returns:
      the regularized matrix with the scaled identity matrix added to its diagonal
    • partialCorrelation

      public static double partialCorrelation(Matrix covariance, double lambda, int x, int y, int... z)

      partialCorrelation.

      Parameters:
      covariance - a Matrix object
      lambda - Singularity lambda
      x - a int
      y - a int
      z - a int
      Returns:
      the partial correlation(x, y | z) where these represent the column/row indices of the desired variables in covariance
    • logCoshScore

      public static double logCoshScore(double[] _f)
      Computes the log-cosh score for a given array of data. This method standardizes the input data, applies the log(cosh) transformation to each value in the dataset, computes the mean of the transformed values, and calculates the squared difference between the mean and a predefined log-cosh constant.
      Parameters:
      _f - an array of double values representing the input data to be analyzed
      Returns:
      the computed log-cosh score as a squared difference between the mean of transformed data and a constant
    • meanAbsolute

      public static double meanAbsolute(double[] _f)
      Calculates the squared difference between the mean of the absolute values of a standardized dataset and the theoretical mean of the standard normal distribution.
      Parameters:
      _f - an array of doubles representing the dataset to be processed and analyzed
      Returns:
      the squared difference between the computed mean of the absolute values and the theoretical mean of the standard normal distribution
    • pow

      public static double pow()
      Calculates the average of 1000 random absolute values generated from a normal distribution with a mean of 0 and standard deviation of 1.
      Returns:
      the average of 1000 absolute values derived from a normal distribution.
    • logCoshExp

      public static double logCoshExp()
      Computes the logarithm of the hyperbolic cosine of an exponential value.
      Returns:
      the pre-defined constant value 0.3746764078432371.
    • entropy

      public static double entropy(int numBins, double[] _f)
      Computes the entropy of a distribution based on the provided data values and number of bins. Entropy is a measure of the uncertainty or randomness in a dataset.
      Parameters:
      numBins - the number of bins to discretize the range of data into
      _f - the array of data values to compute the entropy for
      Returns:
      the calculated entropy value
    • maxEntApprox

      public static double maxEntApprox(double[] x)
      Calculates the maximum entropy approximation of the given data array. This method estimates the negentropy of the input data after standardizing it, and derives an approximation based on the Gaussian entropy.
      Parameters:
      x - the input array of doubles representing the dataset to be analyzed. The array will be standardized before calculating the approximation.
      Returns:
      a double value representing the maximum entropy approximation of the input dataset.
    • standardizeData

      public static double[] standardizeData(double[] data)
      Standardizes the provided data array by subtracting the mean and scaling by the standard deviation. This method transforms the data to have a mean of zero and a standard deviation of one.
      Parameters:
      data - the input array of data to be standardized
      Returns:
      a new array containing the standardized values
    • standardizeData

      public static Vector standardizeData(Vector _data)
      Standardizes the provided vector data by removing the mean and scaling to unit variance.
      Parameters:
      _data - the vector containing the data to be standardized
      Returns:
      a new vector containing the standardized data
    • factorial

      public static long factorial(int c)

      factorial.

      Parameters:
      c - a int
      Returns:
      a double
    • getZForAlpha

      public static double getZForAlpha(double alpha)

      getZForAlpha.

      Parameters:
      alpha - a double
      Returns:
      a double
    • logsum

      public static double logsum(List<Double> logs)

      logsum.

      Parameters:
      logs - a List object
      Returns:
      a double
    • sum

      public static double sum(double[] x)

      sum.

      Parameters:
      x - an array of objects
      Returns:
      a double
    • cov

      public static double[] cov(double[] x, double[] y, double[] condition, double threshold, double direction)

      cov.

      Parameters:
      x - an array of objects
      y - an array of objects
      condition - an array of objects
      threshold - a double
      direction - a double
      Returns:
      an array of objects
    • covMatrix

      public static double[][] covMatrix(double[] x, double[] y, double[][] z, double[] condition, double threshold, double direction)

      covMatrix.

      Parameters:
      x - an array of objects
      y - an array of objects
      z - an array of objects
      condition - an array of objects
      threshold - a double
      direction - a double
      Returns:
      an array of objects
    • getRows

      public static List<Integer> getRows(double[] x, double threshold, double direction)

      getRows.

      Parameters:
      x - an array of objects
      threshold - a double
      direction - a double
      Returns:
      a List object
    • getRows

      public static List<Integer> getRows(double[] x, double[] condition, double threshold, double direction)

      getRows.

      Parameters:
      x - an array of objects
      condition - an array of objects
      threshold - a double
      direction - a double
      Returns:
      a List object
    • E

      public static double[] E(double[] x, double[] y, double[] condition, double threshold, double direction)

      E.

      Parameters:
      x - an array of objects
      y - an array of objects
      condition - an array of objects
      threshold - a double
      direction - a double
      Returns:
      an array of objects
    • hermite1

      public static double hermite1(int index, double x)
      Computes the probabilist's Hermite polynomial of the given index at the specified point.

      The Hermite polynomials are defined recursively: He_0(x) = 1, He_1(x) = x, He_{n+1}(x) = x * He_n(x) - n * He_{n-1}(x).

      Parameters:
      index - the non-negative integer index of the Hermite polynomial. Must be 0 or greater.
      x - the point at which to evaluate the Hermite polynomial.
      Returns:
      the value of the Hermite polynomial of the specified index at the given point.
      Throws:
      IllegalArgumentException - if the index is negative.
    • hermite2

      public static double hermite2(int index, double x)
      Computes the (physicis's) Hermite polynomial of a given index and value. The Hermite polynomials are a sequence of orthogonal polynomials defined by the Rodrigues formula. They are orthogonal with respect to the weight function exp(-x^2). The Hermite polynomial of index n is denoted H_n(x).
      Parameters:
      index - The index of the Hermite polynomial to be computed. This must be a non-negative integer.
      x - The value at which the Hermite polynomial is to be evaluated.
      Returns:
      The computed value of the Hermite polynomial.
    • legendre

      public static double legendre(int index, double x)
      Computes the value of the Legendre polynomial of a given degree at a specified point x.

      The Legendre polynomial is a solution to Legendre's differential equation and is used in physics and engineering, particularly in problems involving spherical coordinates.

      Parameters:
      index - the degree of the Legendre polynomial. Must be a non-negative integer.
      x - the point at which the Legendre polynomial is evaluated.
      Returns:
      the value of the Legendre polynomial of the given degree at the specified point x.
      Throws:
      IllegalArgumentException - if the index is negative.
    • chebyshev

      public static double chebyshev(int index, double x)
      Computes the value of the Chebyshev polynomial of a given degree at a specified point x.
      Parameters:
      index - the degree of the Chebyshev polynomial. Must be a non-negative integer.
      x - the point at which the Chebyshev polynomial is evaluated.
      Returns:
      the value of the Chebyshev polynomial of the given degree at the specified point x.
    • basisFunctionValue

      public static double basisFunctionValue(int type, int index, double x)
      Performs a calculation that involves repeatedly multiplying an initial value of `1.0` by the product of `0.95` and a given parameter `x`, iterating `index` times. The type of function used in the calculation is determined by the `type` parameter. The function types are as follows:
      • 0 = `g(x) = x^index [Polynomial basis]
      • 1 = `g(x) = hermite1(index, x) [Probabilist's Hermite polynomial]
      • 2 = `g(x) = legendre(index, x) [Legendre polynomial]
      • 3 = `g(x) = chebyshev(index, x) [Chebyshev polynomial]
      Any other value of `type` will result in an `IllegalArgumentException`.
      Parameters:
      type - The type of function to be used in the calculation.
      index - The number of iterations to perform the multiplication.
      x - The value to be multiplied by `0.95` in each iteration.
      Returns:
      The result of the iterative multiplication.
    • getChiSquareP

      public static double getChiSquareP(double dof, double chisq)
      Calculates the p-value for the given chi-square statistic and degrees of freedom.
      Parameters:
      dof - Degrees of freedom; must be non-negative.
      chisq - Chi-square statistic; must be non-negative.
      Returns:
      The p-value corresponding to the chi-square statistic and degrees of freedom.
      Throws:
      IllegalArgumentException - if degrees of freedom or chi-square statistic is negative.
    • extractSubMatrix

      public static org.ejml.simple.SimpleMatrix extractSubMatrix(org.ejml.simple.SimpleMatrix matrix, int[] rows, int[] cols)
      Extracts a submatrix from the specified matrix by selecting the rows and columns indicated by the provided indices. The resulting submatrix is composed of values at the intersection of the specified rows and columns.
      Parameters:
      matrix - the input matrix as a SimpleMatrix object from which the submatrix will be extracted
      rows - an array of integers representing the row indices to include in the submatrix
      cols - an array of integers representing the column indices to include in the submatrix
      Returns:
      a SimpleMatrix object representing the extracted submatrix
    • extractSubMatrix

      public static org.ejml.simple.SimpleMatrix extractSubMatrix(org.ejml.simple.SimpleMatrix matrix, int rowStart, int rowEnd, int colStart, int colEnd)
      Extracts a submatrix from the given matrix within the specified row and column bounds.
      Parameters:
      matrix - The input matrix.
      rowStart - The starting row index (inclusive).
      rowEnd - The ending row index (exclusive).
      colStart - The starting column index (inclusive).
      colEnd - The ending column index (exclusive).
      Returns:
      The extracted submatrix.
    • chol

      public static org.ejml.simple.SimpleMatrix chol(org.ejml.simple.SimpleMatrix A)
      Computes the Cholesky decomposition of the given matrix and returns its lower triangular matrix.
      Parameters:
      A - The input positive definite matrix.
      Returns:
      The lower triangular matrix from the Cholesky decomposition.