Go to the first, previous, next, last section, table of contents.

Statistics

I hope that someday Octave will include more statistics functions. If you would like to help improve Octave in this area, please contact bug-octave@bevo.che.wisc.edu.

Basic Statistical Functions

@anchor{doc-mean}

Function File: mean (x, dim, opt)

If x is a vector, compute the mean of the elements of x

mean (x) = SUM_i x(i) / N

If x is a matrix, compute the mean for each column and return them in a row vector.

With the optional argument opt, the kind of mean computed can be selected. The following options are recognized:

"a": Compute the (ordinary) arithmetic mean. This is the default.
"g": Computer the geometric mean.
"h": Compute the harmonic mean.

If the optional argument dim is supplied, work along dimension dim.

Both dim and opt are optional. If both are supplied, either may appear first.

@anchor{doc-median}

Function File: median (x)

If x is a vector, compute the median value of the elements of x.

            x(ceil(N/2)),             N odd
median(x) =
            (x(N/2) + x((N/2)+1))/2,  N even

If x is a matrix, compute the median value for each column and return them in a row vector.

@seealso{std and mean}

@anchor{doc-std}

Function File: std (x)

If x is a vector, compute the standard deviation of the elements of x.

std (x) = sqrt (sumsq (x - mean (x)) / (n - 1))

If x is a matrix, compute the standard deviation for each column and return them in a row vector.

@seealso{mean and median}

@anchor{doc-cov}

Function File: cov (x, y): If each row of x and y is an observation and each column is a variable, the (i,j)-th entry of cov (x, y) is the covariance between the i-th variable in x and the j-th variable in y. If called with one argument, compute cov (x, x).

@anchor{doc-corrcoef}

Function File: corrcoef (x, y): If each row of x and y is an observation and each column is a variable, the (i,j)-th entry of corrcoef (x, y) is the correlation between the i-th variable in x and the j-th variable in y. If called with one argument, compute corrcoef (x, x).

@anchor{doc-kurtosis}

Function File: kurtosis (x)

If x is a vector of length @math{N}, return the kurtosis

kurtosis (x) = N^(-1) std(x)^(-4) sum ((x - mean(x)).^4) - 3

of x. If x is a matrix, return the row vector containing the kurtosis of each column.

@anchor{doc-mahalanobis}

Function File: mahalanobis (x, y): Return the Mahalanobis' D-square distance between the multivariate samples x and y, which must have the same number of components (columns), but may have a different number of observations (rows).

@anchor{doc-skewness}

Function File: skewness (x)

If x is a vector of length @math{n}, return the skewness

skewness (x) = N^(-1) std(x)^(-3) sum ((x - mean(x)).^3)

of x. If x is a matrix, return the row vector containing the skewness of each column.

@anchor{doc-values}

Function File: values (x): Return the different values in a column vector, arranged in ascending order.

@anchor{doc-var}

Function File: var (x): For vector arguments, return the (real) variance of the values. For matrix arguments, return a row vector contaning the variance for each column.

@anchor{doc-table}

Function File: [t, l_x] = table (x)

Function File: [t, l_x, l_y] = table (x, y)

Create a contingency table t from data vectors. The l vectors are the corresponding levels.

Currently, only 1- and 2-dimensional tables are supported.

@anchor{doc-studentize}

Function File: studentize (x)

If x is a vector, subtract its mean and divide by its standard deviation.

If x is a matrix, do the above for each column.

@anchor{doc-statistics}

Function File: statistics (x)

If x is a matrix, return a matrix with the minimum, first quartile, median, third quartile, maximum, mean, standard deviation, skewness and kurtosis of the columns of x as its rows.

If x is a vector, treat it as a column vector.

@anchor{doc-spearman}

Function File: spearman (x, y)

Compute Spearman's rank correlation coefficient rho for each of the variables specified by the input arguments.

For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.

spearman (x) is equivalent to spearman (x, x).

For two data vectors x and y, Spearman's rho is the correlation of the ranks of x and y.

If x and y are drawn from independent distributions, rho has zero mean and variance 1 / (n - 1), and is asymptotically normally distributed.

@anchor{doc-run_count}

Function File: run_count (x, n): Count the upward runs in the columns of x of length 1, 2, ..., n-1 and greater than or equal to n.

@anchor{doc-ranks}

Function File: ranks (x)

If x is a vector, return the (column) vector of ranks of x adjusted for ties.

If x is a matrix, do the above for each column of x.

@anchor{doc-range}

Function File: range (x)

If x is a vector, return the range, i.e., the difference between the maximum and the minimum, of the input data.

If x is a matrix, do the above for each column of x.

@anchor{doc-qqplot}

Function File: [q, s] = qqplot (x, dist, params)

Perform a QQ-plot (quantile plot).

If F is the CDF of the distribution dist with parameters params and G its inverse, and x a sample vector of length n, the QQ-plot graphs ordinate s(i) = i-th largest element of x versus abscissa q(if) = G((i - 0.5)/n).

If the sample comes from F except for a transformation of location and scale, the pairs will approximately follow a straight line.

The default for dist is the standard normal distribution. The optional argument params contains a list of parameters of dist. For example, for a quantile plot of the uniform distribution on [2,4] and x, use

qqplot (x, "uniform", 2, 4)

If no output arguments are given, the data are plotted directly.

@anchor{doc-probit}

Function File: probit (p): For each component of p, return the probit (the quantile of the standard normal distribution) of p.

@anchor{doc-ppplot}

Function File: [p, y] = ppplot (x, dist, params)

Perform a PP-plot (probability plot).

If F is the CDF of the distribution dist with parameters params and x a sample vector of length n, the PP-plot graphs ordinate y(i) = F (i-th largest element of x) versus abscissa p(i) = (i - 0.5)/n. If the sample comes from F, the pairs will approximately follow a straight line.

The default for dist is the standard normal distribution. The optional argument params contains a list of parameters of dist. For example, for a probability plot of the uniform distribution on [2,4] and x, use

ppplot (x, "uniform", 2, 4)

If no output arguments are given, the data are plotted directly.

@anchor{doc-moment}

Function File: moment (x, p, opt)

If x is a vector, compute the p-th moment of x.

If x is a matrix, return the row vector containing the p-th moment of each column.

With the optional string opt, the kind of moment to be computed can be specified. If opt contains "c" or "a", central and/or absolute moments are returned. For example,

moment (x, 3, "ac")

computes the third central absolute moment of x.

@anchor{doc-meansq}

Function File: meansq (x): For vector arguments, return the mean square of the values. For matrix arguments, return a row vector contaning the mean square of each column.

@anchor{doc-logit}

Function File: logit (p): For each component of p, return the logit log (p / (1-p)) of p.

@anchor{doc-kendall}

Function File: kendall (x, y)

Compute Kendall's tau for each of the variables specified by the input arguments.

For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.

kendall (x) is equivalent to kendall (x, x).

For two data vectors x, y of common length n, Kendall's tau is the correlation of the signs of all rank differences of x and y; i.e., if both x and y have distinct entries, then

         1    
tau = -------   SUM sign (q(i) - q(j)) * sign (r(i) - r(j))
      n (n-1)   i,j

in which the q(i) and r(i) are the ranks of x and y, respectively.

If x and y are drawn from independent distributions, Kendall's tau is asymptotically normal with mean 0 and variance (2 * (2n+5)) / (9 * n * (n-1)).

@anchor{doc-iqr}

Function File: iqr (x)

If x is a vector, return the interquartile range, i.e., the difference between the upper and lower quartile, of the input data.

If x is a matrix, do the above for each column of x.

@anchor{doc-cut}

Function File: cut (x, breaks)

Create categorical data out of numerical or continuous data by cutting into intervals.

If breaks is a scalar, the data is cut into that many equal-width intervals. If breaks is a vector of break points, the category has length (breaks) - 1 groups.

The returned value is a vector of the same size as x telling which group each point in x belongs to. Groups are labelled from 1 to the number of groups; points outside the range of breaks are labelled by NaN.

@anchor{doc-cor}

Function File: cor (x, y)

The (i,j)-th entry of cor (x, y) is the correlation between the i-th variable in x and the j-th variable in y.

For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.

cor (x) is equivalent to cor (x, x).

@anchor{doc-cloglog}

Function File: cloglog (x)

Return the complementary log-log function of x, defined as

- log (- log (x))

@anchor{doc-center}

Function File: center (x): If x is a vector, subtract its mean. If x is a matrix, do the above for each column.

Tests

@anchor{doc-anova}

Function File: [pval, f, df_b, df_w] = anova (y, g)

Perform a one-way analysis of variance (ANOVA). The goal is to test whether the population means of data taken from k different groups are all equal.

Data may be given in a single vector y with groups specified by a corresponding vector of group labels g (e.g., numbers from 1 to k). This is the general form which does not impose any restriction on the number of data in each group or the group labels.

If y is a matrix and g is omitted, each column of y is treated as a group. This form is only appropriate for balanced ANOVA in which the numbers of samples from each group are all equal.

Under the null of constant means, the statistic f follows an F distribution with df_b and df_w degrees of freedom.

The p-value (1 minus the CDF of this distribution at f) is returned in pval.

If no output argument is given, the standard one-way ANOVA table is printed.

@anchor{doc-bartlett_test}

Function File: [pval, chisq, df] = bartlett_test (x1, ...)

Perform a Bartlett test for the homogeneity of variances in the data vectors x1, x2, ..., xk, where k > 1.

Under the null of equal variances, the test statistic chisq approximately ollows a chi-square distribution with df degrees of freedom.

The p-value (1 minus the CDF of this distribution at chisq) is returned in pval.

If no output argument is given, the p-value is displayed.

@anchor{doc-chisquare_test_homogeneity}

Function File: [pval, chisq, df] = chisquare_test_homogeneity (x, y, c)

Given two samples x and y, perform a chisquare test for homogeneity of the null hypothesis that x and y come from the same distribution, based on the partition induced by the (strictly increasing) entries of c.

For large samples, the test statistic chisq approximately follows a chisquare distribution with df = length (c) degrees of freedom.

The p-value (1 minus the CDF of this distribution at chisq) is returned in pval.

If no output argument is given, the p-value is displayed.

@anchor{doc-chisquare_test_independence}

Function File: [pval, chisq, df] = chisquare_test_independence (x)

Perform a chi-square test for indepence based on the contingency table x. Under the null hypothesis of independence, chisq approximately has a chi-square distribution with df degrees of freedom.

The p-value (1 minus the CDF of this distribution at chisq) of the test is returned in pval.

If no output argument is given, the p-value is displayed.

@anchor{doc-cor_test}

Function File: cor_test (x, y, alt, method)

Test whether two samples x and y come from uncorrelated populations.

The optional argument string alt describes the alternative hypothesis, and can be "!=" or "<>" (non-zero), ">" (greater than 0), or "<" (less than 0). The default is the two-sided case.

The optional argument string method specifies on which correlation coefficient the test should be based. If method is "pearson" (default), the (usual) Pearson's product moment correlation coefficient is used. In this case, the data should come from a bivariate normal distribution. Otherwise, the other two methods offer nonparametric alternatives. If method is "kendall", then Kendall's rank correlation tau is used. If method is "spearman", then Spearman's rank correlation rho is used. Only the first character is necessary.

The output is a structure with the following elements:

pval: The p-value of the test.
stat: The value of the test statistic.
dist: The distribution of the test statistic.
params: The parameters of the null distribution of the test statistic.
alternative: The alternative hypothesis.
method: The method used for testing.

If no output argument is given, the p-value is displayed.

@anchor{doc-f_test_regression}

Function File: [pval, f, df_num, df_den] = f_test_regression (y, x, rr, r)

Perform an F test for the null hypothesis rr * b = r in a classical normal regression model y = X * b + e.

Under the null, the test statistic f follows an F distribution with df_num and df_den degrees of freedom.

The p-value (1 minus the CDF of this distribution at f) is returned in pval.

If not given explicitly, r = 0.

If no output argument is given, the p-value is displayed.

@anchor{doc-hotelling_test}

Function File: [pval, tsq] = hotelling_test (x, m)

For a sample x from a multivariate normal distribution with unknown mean and covariance matrix, test the null hypothesis that

mean
(x) == m

Hotelling's T^2 is returned in tsq. Under the null, @math{(n-p) T^2 / (p(n-1))} has an F distribution with @math{p} and @math{n-p} degrees of freedom, where @math{n} and @math{p} are the numbers of samples and variables, respectively.

The p-value of the test is returned in pval.

If no output argument is given, the p-value of the test is displayed.

@anchor{doc-hotelling_test_2}

Function File: [pval, tsq] = hotelling_test_2 (x, y)

For two samples x from multivariate normal distributions with the same number of variables (columns), unknown means and unknown equal covariance matrices, test the null hypothesis

mean
(x) == mean (y)

Hotelling's two-sample T^2 is returned in tsq. Under the null,

(n_x+n_y-p-1) T^2 / (p(n_x+n_y-2))

has an F distribution with @math{p} and @math{n_x+n_y-p-1} degrees of freedom, where @math{n_x} and @math{n_y} are the sample sizes and @math{p} is the number of variables.

The p-value of the test is returned in pval.

If no output argument is given, the p-value of the test is displayed.

@anchor{doc-kolmogorov_smirnov_test}

Function File: [pval, ks] = kolmogorov_smirnov_test (x, dist, params, alt)

Perform a Kolmogorov-Smirnov test of the null hypothesis that the sample x comes from the (continuous) distribution dist. I.e., if F and G are the CDFs corresponding to the sample and dist, respectively, then the null is that F == G.

The optional argument params contains a list of parameters of dist. For example, to test whether a sample x comes from a uniform distribution on [2,4], use

kolmogorov_smirnov_test(x, "uniform", 2, 4)

With the optional argument string alt, the alternative of interest can be selected. If alt is "!=" or "<>", the null is tested against the two-sided alternative F != G. In this case, the test statistic ks follows a two-sided Kolmogorov-Smirnov distribution. If alt is ">", the one-sided alternative F > G is considered. Similarly for "<", the one-sided alternative F > G is considered. In this case, the test statistic ks has a one-sided Kolmogorov-Smirnov distribution. The default is the two-sided case.

The p-value of the test is returned in pval.

If no output argument is given, the p-value is displayed.

@anchor{doc-kolmogorov_smirnov_test_2}

Function File: [pval, ks, d] = kolmogorov_smirnov_test_2 (x, y, alt)

Perform a 2-sample Kolmogorov-Smirnov test of the null hypothesis that the samples x and y come from the same (continuous) distribution. I.e., if F and G are the CDFs corresponding to the x and y samples, respectively, then the null is that F == G.

With the optional argument string alt, the alternative of interest can be selected. If alt is "!=" or "<>", the null is tested against the two-sided alternative F != G. In this case, the test statistic ks follows a two-sided Kolmogorov-Smirnov distribution. If alt is ">", the one-sided alternative F > G is considered. Similarly for "<", the one-sided alternative F < G is considered. In this case, the test statistic ks has a one-sided Kolmogorov-Smirnov distribution. The default is the two-sided case.

The p-value of the test is returned in pval.

The third returned value, d, is the test statistic, the maximum vertical distance between the two cumulative distribution functions.

If no output argument is given, the p-value is displayed.

@anchor{doc-kruskal_wallis_test}

Function File: [pval, k, df] = kruskal_wallis_test (x1, ...)

Perform a Kruskal-Wallis one-factor "analysis of variance".

Suppose a variable is observed for k > 1 different groups, and let x1, ..., xk be the corresponding data vectors.

Under the null hypothesis that the ranks in the pooled sample are not affected by the group memberships, the test statistic k is approximately chi-square with df = k - 1 degrees of freedom.

The p-value (1 minus the CDF of this distribution at k) is returned in pval.

If no output argument is given, the p-value is displayed.

@anchor{doc-manova}

Function File: manova (y, g)

Perform a one-way multivariate analysis of variance (MANOVA). The goal is to test whether the p-dimensional population means of data taken from k different groups are all equal. All data are assumed drawn independently from p-dimensional normal distributions with the same covariance matrix.

The data matrix is given by y. As usual, rows are observations and columns are variables. The vector g specifies the corresponding group labels (e.g., numbers from 1 to k).

The LR test statistic (Wilks' Lambda) and approximate p-values are computed and displayed.

@anchor{doc-mcnemar_test}

Function File: [pval, chisq, df] = mcnemar_test (x)

For a square contingency table x of data cross-classified on the row and column variables, McNemar's test can be used for testing the null hypothesis of symmetry of the classification probabilities.

Under the null, chisq is approximately distributed as chisquare with df degrees of freedom.

The p-value (1 minus the CDF of this distribution at chisq) is returned in pval.

If no output argument is given, the p-value of the test is displayed.

@anchor{doc-prop_test_2}

Function File: [pval, z] = prop_test_2 (x1, n1, x2, n2, alt)

If x1 and n1 are the counts of successes and trials in one sample, and x2 and n2 those in a second one, test the null hypothesis that the success probabilities p1 and p2 are the same. Under the null, the test statistic z approximately follows a standard normal distribution.

With the optional argument string alt, the alternative of interest can be selected. If alt is "!=" or "<>", the null is tested against the two-sided alternative p1 != p2. If alt is ">", the one-sided alternative p1 > p2 is used. Similarly for "<", the one-sided alternative p1 < p2 is used. The default is the two-sided case.

The p-value of the test is returned in pval.

If no output argument is given, the p-value of the test is displayed.

@anchor{doc-run_test}

Function File: [pval, chisq] = run_test (x)

Perform a chi-square test with 6 degrees of freedom based on the upward runs in the columns of x. Can be used to test whether x contains independent data.

The p-value of the test is returned in pval.

If no output argument is given, the p-value is displayed.

@anchor{doc-sign_test}

Function File: [pval, b, n] = sign_test (x, y, alt)

For two matched-pair samples x and y, perform a sign test of the null hypothesis PROB (x > y) == PROB (x < y) == 1/2. Under the null, the test statistic b roughly follows a binomial distribution with parameters

n = sum
(x != y)

and p = 1/2.

With the optional argument alt, the alternative of interest can be selected. If alt is "!=" or "<>", the null hypothesis is tested against the two-sided alternative PROB (x < y) != 1/2. If alt is ">", the one-sided alternative PROB (x > y) > 1/2 ("x is stochastically greater than y") is considered. Similarly for "<", the one-sided alternative PROB (x > y) < 1/2 ("x is stochastically less than y") is considered. The default is the two-sided case.

The p-value of the test is returned in pval.

If no output argument is given, the p-value of the test is displayed.

@anchor{doc-t_test}

Function File: [pval, t, df] = t_test (x, m, alt)

For a sample x from a normal distribution with unknown mean and variance, perform a t-test of the null hypothesis

mean
(x) == m

. Under the null, the test statistic t follows a Student distribution with

df = length (x)
- 1

degrees of freedom.

With the optional argument string alt, the alternative of interest can be selected. If alt is "!=" or "<>", the null is tested against the two-sided alternative mean (x) != m. If alt is ">", the one-sided alternative mean (x) > m is considered. Similarly for "<", the one-sided alternative mean (x) < m is considered, The default is the two-sided case.

The p-value of the test is returned in pval.

If no output argument is given, the p-value of the test is displayed.

@anchor{doc-t_test_2}

Function File: [pval, t, df] = t_test_2 (x, y, alt)

For two samples x and y from normal distributions with unknown means and unknown equal variances, perform a two-sample t-test of the null hypothesis of equal means. Under the null, the test statistic t follows a Student distribution with df degrees of freedom.

With the optional argument string alt, the alternative of interest can be selected. If alt is "!=" or "<>", the null is tested against the two-sided alternative mean (x) != mean (y). If alt is ">", the one-sided alternative mean (x) > mean (y) is used. Similarly for "<", the one-sided alternative mean (x) < mean (y) is used. The default is the two-sided case.

The p-value of the test is returned in pval.

If no output argument is given, the p-value of the test is displayed.

@anchor{doc-t_test_regression}

Function File: [pval, t, df] = t_test_regression (y, x, rr, r, alt)

Perform an t test for the null hypothesis

rr * b =
r

in a classical normal regression model

y =
x * b + e

. Under the null, the test statistic t follows a t distribution with df degrees of freedom.

If r is omitted, a value of 0 is assumed.

With the optional argument string alt, the alternative of interest can be selected. If alt is "!=" or "<>", the null is tested against the two-sided alternative rr * b != r. If alt is ">", the one-sided alternative rr * b > r is used. Similarly for "<", the one-sided alternative rr * b < r is used. The default is the two-sided case.

The p-value of the test is returned in pval.

If no output argument is given, the p-value of the test is displayed.

@anchor{doc-u_test}

Function File: [pval, z] = u_test (x, y, alt)

For two samples x and y, perform a Mann-Whitney U-test of the null hypothesis PROB (x > y) == 1/2 == PROB (x < y). Under the null, the test statistic z approximately follows a standard normal distribution. Note that this test is equivalent to the Wilcoxon rank-sum test.

With the optional argument string alt, the alternative of interest can be selected. If alt is "!=" or "<>", the null is tested against the two-sided alternative PROB (x > y) != 1/2. If alt is ">", the one-sided alternative PROB (x > y) > 1/2 is considered. Similarly for "<", the one-sided alternative PROB (x > y) < 1/2 is considered, The default is the two-sided case.

The p-value of the test is returned in pval.

If no output argument is given, the p-value of the test is displayed.

@anchor{doc-var_test}

Function File: [pval, f, df_num, df_den] = var_test (x, y, alt)

For two samples x and y from normal distributions with unknown means and unknown variances, perform an F-test of the null hypothesis of equal variances. Under the null, the test statistic f follows an F-distribution with df_num and df_den degrees of freedom.

With the optional argument string alt, the alternative of interest can be selected. If alt is "!=" or "<>", the null is tested against the two-sided alternative var (x) != var (y). If alt is ">", the one-sided alternative var (x) > var (y) is used. Similarly for "<", the one-sided alternative var (x) > var (y) is used. The default is the two-sided case.

The p-value of the test is returned in pval.

If no output argument is given, the p-value of the test is displayed.

@anchor{doc-welch_test}

Function File: [pval, t, df] = welch_test (x, y, alt)

For two samples x and y from normal distributions with unknown means and unknown and not necessarily equal variances, perform a Welch test of the null hypothesis of equal means. Under the null, the test statistic t approximately follows a Student distribution with df degrees of freedom.

With the optional argument string alt, the alternative of interest can be selected. If alt is "!=" or "<>", the null is tested against the two-sided alternative mean (x) != m. If alt is ">", the one-sided alternative mean(x) > m is considered. Similarly for "<", the one-sided alternative mean(x) < m is considered. The default is the two-sided case.

The p-value of the test is returned in pval.

If no output argument is given, the p-value of the test is displayed.

@anchor{doc-wilcoxon_test}

Function File: [pval, z] = wilcoxon_test (x, y, alt)

For two matched-pair sample vectors x and y, perform a Wilcoxon signed-rank test of the null hypothesis PROB (x > y) == 1/2. Under the null, the test statistic z approximately follows a standard normal distribution.

With the optional argument string alt, the alternative of interest can be selected. If alt is "!=" or "<>", the null is tested against the two-sided alternative PROB (x > y) != 1/2. If alt is ">", the one-sided alternative PROB (x > y) > 1/2 is considered. Similarly for "<", the one-sided alternative PROB (x > y) < 1/2 is considered. The default is the two-sided case.

The p-value of the test is returned in pval.

If no output argument is given, the p-value of the test is displayed.

@anchor{doc-z_test}

Function File: [pval, z] = z_test (x, m, v, alt)

Perform a Z-test of the null hypothesis

mean (x) ==
m

for a sample x from a normal distribution with unknown mean and known variance v. Under the null, the test statistic z follows a standard normal distribution.

The p-value of the test is returned in pval.

If no output argument is given, the p-value of the test is displayed along with some information.

@anchor{doc-z_test_2}

Function File: [pval, z] = z_test_2 (x, y, v_x, v_y, alt)

For two samples x and y from normal distributions with unknown means and known variances v_x and v_y, perform a Z-test of the hypothesis of equal means. Under the null, the test statistic z follows a standard normal distribution.

With the optional argument string alt, the alternative of interest can be selected. If alt is "!=" or "<>", the null is tested against the two-sided alternative mean (x) != mean (y). If alt is ">", the one-sided alternative mean (x) > mean (y) is used. Similarly for "<", the one-sided alternative mean (x) < mean (y) is used. The default is the two-sided case.

The p-value of the test is returned in pval.

If no output argument is given, the p-value of the test is displayed along with some information.

Models

@anchor{doc-logistic_regression}

Functio File: [theta, beta, dev, dl, d2l, p] = logistic_regression (y, x, print, theta, beta)

Perform ordinal logistic regression.

Suppose y takes values in k ordered categories, and let gamma_i (x) be the cumulative probability that y falls in one of the first i categories given the covariate x. Then

[theta, beta] = logistic_regression (y, x)

fits the model

logit (gamma_i (x)) = theta_i - beta' * x,   i = 1, ..., k-1

The number of ordinal categories, k, is taken to be the number of distinct values of round (y). If k equals 2, y is binary and the model is ordinary logistic regression. The matrix x is assumed to have full column rank.

Given y only, theta = logistic_regression (y) fits the model with baseline logit odds only.

The full form is

[theta, beta, dev, dl, d2l, gamma]
   = logistic_regression (y, x, print, theta, beta)

in which all output arguments and all input arguments except y are optional.

Stting print to 1 requests summary information about the fitted model to be displayed. Setting print to 2 requests information about convergence at each iteration. Other values request no information to be displayed. The input arguments theta and beta give initial estimates for theta and beta.

The returned value dev holds minus twice the log-likelihood.

The returned values dl and d2l are the vector of first and the matrix of second derivatives of the log-likelihood with respect to theta and beta.

p holds estimates for the conditional distribution of y given x.

Distributions

@anchor{doc-beta_cdf}

Function File: beta_cdf (x, a, b): For each element of x, returns the CDF at x of the beta distribution with parameters a and b, i.e., PROB (beta (a, b) <= x).

@anchor{doc-beta_inv}

Function File: beta_inv (x, a, b): For each component of x, compute the quantile (the inverse of the CDF) at x of the Beta distribution with parameters a and b.

@anchor{doc-beta_pdf}

Function File: beta_pdf (x, a, b): For each element of x, returns the PDF at x of the beta distribution with parameters a and b.

@anchor{doc-beta_rnd}

Function File: beta_rnd (a, b, r, c)

Return an r by c matrix of random samples from the Beta distribution with parameters a and b. Both a and b must be scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the common size of a and b.

@anchor{doc-binomial_cdf}

Function File: binomial_cdf (x, n, p): For each element of x, compute the CDF at x of the binomial distribution with parameters n and p.

@anchor{doc-binomial_inv}

Function File: binomial_inv (x, n, p): For each element of x, compute the quantile at x of the binomial distribution with parameters n and p.

@anchor{doc-binomial_pdf}

Function File: binomial_pdf (x, n, p): For each element of x, compute the probability density function (PDF) at x of the binomial distribution with parameters n and p.

@anchor{doc-binomial_rnd}

Function File: binomial_rnd (n, p, r, c)

Return an r by c matrix of random samples from the binomial distribution with parameters n and p. Both n and p must be scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the common size of n and p.

@anchor{doc-cauchy_cdf}

Function File: cauchy_cdf (x, lambda, sigma): For each element of x, compute the cumulative distribution function (CDF) at x of the Cauchy distribution with location parameter lambda and scale parameter sigma. Default values are lambda = 0, sigma = 1.

@anchor{doc-cauchy_inv}

Function File: cauchy_inv (x, lambda, sigma): For each element of x, compute the quantile (the inverse of the CDF) at x of the Cauchy distribution with location parameter lambda and scale parameter sigma. Default values are lambda = 0, sigma = 1.

@anchor{doc-cauchy_pdf}

Function File: cauchy_pdf (x, lambda, sigma): For each element of x, compute the probability density function (PDF) at x of the Cauchy distribution with location parameter lambda and scale parameter sigma > 0. Default values are lambda = 0, sigma = 1.

@anchor{doc-cauchy_rnd}

Function File: cauchy_rnd (lambda, sigma, r, c)

Return an r by c matrix of random samples from the Cauchy distribution with parameters lambda and sigma which must both be scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the common size of lambda and sigma.

@anchor{doc-chisquare_cdf}

Function File: chisquare_cdf (x, n): For each element of x, compute the cumulative distribution function (CDF) at x of the chisquare distribution with n degrees of freedom.

@anchor{doc-chisquare_inv}

Function File: chisquare_inv (x, n): For each element of x, compute the quantile (the inverse of the CDF) at x of the chisquare distribution with n degrees of freedom.

@anchor{doc-chisquare_pdf}

Function File: chisquare_pdf (x, n): For each element of x, compute the probability density function (PDF) at x of the chisquare distribution with k degrees of freedom.

@anchor{doc-chisquare_rnd}

Function File: chisquare_rnd (n, r, c)

Return an r by c matrix of random samples from the chisquare distribution with n degrees of freedom. n must be a scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the size of n.

@anchor{doc-discrete_cdf}

Function File: discrete_cdf (x, v, p): For each element of x, compute the cumulative distribution function (CDF) at x of a univariate discrete distribution which assumes the values in v with probabilities p.

@anchor{doc-discrete_inv}

Function File: discrete_inv (x, v, p): For each component of x, compute the quantile (the inverse of the CDF) at x of the univariate distribution which assumes the values in v with probabilities p.

@anchor{doc-discrete_pdf}

Function File: discrete_pdf (x, v, p): For each element of x, compute the probability density function (pDF) at x of a univariate discrete distribution which assumes the values in v with probabilities p.

@anchor{doc-discrete_rnd}

Function File: discrete_rnd (n, v, p)

Generate a row vector containing a random sample of size n from the univariate distribution which assumes the values in v with probabilities p.

Currently, n must be a scalar.

@anchor{doc-empirical_cdf}

Function File: empirical_cdf (x, data): For each element of x, compute the cumulative distribution function (CDF) at x of the empirical distribution obtained from the univariate sample data.

@anchor{doc-empirical_inv}

Function File: empirical_inv (x, data): For each element of x, compute the quantile (the inverse of the CDF) at x of the empirical distribution obtained from the univariate sample data.

@anchor{doc-empirical_pdf}

Function File: empirical_pdf (x, data): For each element of x, compute the probability density function (PDF) at x of the empirical distribution obtained from the univariate sample data.

@anchor{doc-empirical_rnd}

Function File: empirical_rnd (n, data): Generate a bootstrap sample of size n from the empirical distribution obtained from the univariate sample data.

@anchor{doc-exponential_cdf}

Function File: exponential_cdf (x, lambda)

For each element of x, compute the cumulative distribution function (CDF) at x of the exponential distribution with parameter lambda.

The arguments can be of common size or scalar.

@anchor{doc-exponential_inv}

Function File: exponential_inv (x, lambda): For each element of x, compute the quantile (the inverse of the CDF) at x of the exponential distribution with parameter lambda.

@anchor{doc-exponential_pdf}

Function File: exponential_pdf (x, lambda): For each element of x, compute the probability density function (PDF) of the exponential distribution with parameter lambda.

@anchor{doc-exponential_rnd}

Function File: exponential_rnd (lambda, r, c)

Return an r by c matrix of random samples from the exponential distribution with parameter lambda, which must be a scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the size of lambda.

@anchor{doc-f_cdf}

Function File: f_cdf (x, m, n): For each element of x, compute the CDF at x of the F distribution with m and n degrees of freedom, i.e., PROB (F (m, n) <= x).

@anchor{doc-f_inv}

Function File: f_inv (x, m, n): For each component of x, compute the quantile (the inverse of the CDF) at x of the F distribution with parameters m and n.

@anchor{doc-f_pdf}

Function File: f_pdf (x, m, n): For each element of x, compute the probability density function (PDF) at x of the F distribution with m and n degrees of freedom.

@anchor{doc-f_rnd}

Function File: f_rnd (m, n, r, c)

Return an r by c matrix of random samples from the F distribution with m and n degrees of freedom. Both m and n must be scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the common size of m and n.

@anchor{doc-gamma_cdf}

Function File: gamma_cdf (x, a, b): For each element of x, compute the cumulative distribution function (CDF) at x of the Gamma distribution with parameters a and b.

@anchor{doc-gamma_inv}

Function File: gamma_inv (x, a, b): For each component of x, compute the quantile (the inverse of the CDF) at x of the Gamma distribution with parameters a and b.

@anchor{doc-gamma_pdf}

Function File: gamma_pdf (x, a, b): For each element of x, return the probability density function (PDF) at x of the Gamma distribution with parameters a and b.

@anchor{doc-gamma_rnd}

Function File: gamma_rnd (a, b, r, c)

Return an r by c matrix of random samples from the Gamma distribution with parameters a and b. Both a and b must be scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the common size of a and b.

@anchor{doc-geometric_cdf}

Function File: geometric_cdf (x, p): For each element of x, compute the CDF at x of the geometric distribution with parameter p.

@anchor{doc-geometric_inv}

Function File: geometric_inv (x, p): For each element of x, compute the quantile at x of the geometric distribution with parameter p.

@anchor{doc-geometric_pdf}

Function File: geometric_pdf (x, p): For each element of x, compute the probability density function (PDF) at x of the geometric distribution with parameter p.

@anchor{doc-geometric_rnd}

Function File: geometric_rnd (p, r, c)

Return an r by c matrix of random samples from the geometric distribution with parameter p, which must be a scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the size of p.

@anchor{doc-hypergeometric_cdf}

Function File: hypergeometric_cdf (x, m, t, n)

Compute the cumulative distribution function (CDF) at x of the hypergeometric distribution with parameters m, t, and n. This is the probability of obtaining not more than x marked items when randomly drawing a sample of size n without replacement from a population of total size t containing m marked items.

The parameters m, t, and n must positive integers with m and n not greater than t.

@anchor{doc-hypergeometric_inv}

Function File: hypergeometric_inv (x, m, t, n)

For each element of x, compute the quantile at x of the hypergeometric distribution with parameters m, t, and n.

The parameters m, t, and n must positive integers with m and n not greater than t.

@anchor{doc-hypergeometric_pdf}

Function File: hypergeometric_pdf (x, m, t, n)

Compute the probability density function (PDF) at x of the hypergeometric distribution with parameters m, t, and n. This is the probability of obtaining x marked items when randomly drawing a sample of size n without replacement from a population of total size t containing m marked items.

The arguments must be of common size or scalar.

@anchor{doc-hypergeometric_rnd}

Function File: hypergeometric_rnd (n_size, m, t, n)

Generate a row vector containing a random sample of size n_size from the hypergeometric distribution with parameters m, t, and n.

The parameters m, t, and n must positive integers with m and n not greater than t.

@anchor{doc-kolmogorov_smirnov_cdf}

Function File: kolmogorov_smirnov_cdf (x, tol)

Return the CDF at x of the Kolmogorov-Smirnov distribution,

         Inf
Q(x) =   SUM    (-1)^k exp(-2 k^2 x^2)
       k = -Inf

for x > 0.

The optional parameter tol specifies the precision up to which the series should be evaluated; the default is tol = eps.

@anchor{doc-laplace_cdf}

Function File: laplace_cdf (x): For each element of x, compute the cumulative distribution function (CDF) at x of the Laplace distribution.

@anchor{doc-laplace_inv}

Function File: laplace_inv (x): For each element of x, compute the quantile (the inverse of the CDF) at x of the Laplace distribution.

@anchor{doc-laplace_pdf}

Function File: laplace_pdf (x): For each element of x, compute the probability density function (PDF) at x of the Laplace distribution.

@anchor{doc-laplace_rnd}

Function File: laplace_rnd (r, c): Return an r by c matrix of random numbers from the Laplace distribution.

@anchor{doc-logistic_cdf}

Function File: logistic_cdf (x): For each component of x, compute the CDF at x of the logistic distribution.

@anchor{doc-logistic_inv}

Function File: logistic_inv (x): For each component of x, compute the quantile (the inverse of the CDF) at x of the logistic distribution.

@anchor{doc-logistic_pdf}

Function File: logistic_pdf (x): For each component of x, compute the PDF at x of the logistic distribution.

@anchor{doc-logistic_rnd}

Function File: logistic_rnd (r, c): Return an r by c matrix of random numbers from the logistic distribution.

@anchor{doc-lognormal_cdf}

Function File: lognormal_cdf (x, a, v)

For each element of x, compute the cumulative distribution function (CDF) at x of the lognormal distribution with parameters a and v. If a random variable follows this distribution, its logarithm is normally distributed with mean log (a) and variance v.

Default values are a = 1, v = 1.

@anchor{doc-lognormal_inv}

Function File: lognormal_inv (x, a, v)

For each element of x, compute the quantile (the inverse of the CDF) at x of the lognormal distribution with parameters a and v. If a random variable follows this distribution, its logarithm is normally distributed with mean log (a) and variance v.

Default values are a = 1, v = 1.

@anchor{doc-lognormal_pdf}

Function File: lognormal_pdf (x, a, v)

For each element of x, compute the probability density function (PDF) at x of the lognormal distribution with parameters a and v. If a random variable follows this distribution, its logarithm is normally distributed with mean log (a) and variance v.

Default values are a = 1, v = 1.

@anchor{doc-lognormal_rnd}

Function File: lognormal_rnd (a, v, r, c)

Return an r by c matrix of random samples from the lognormal distribution with parameters a and v. Both a and v must be scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the common size of a and v.

@anchor{doc-normal_cdf}

Function File: normal_cdf (x, m, v)

For each element of x, compute the cumulative distribution function (CDF) at x of the normal distribution with mean m and variance v.

Default values are m = 0, v = 1.

@anchor{doc-normal_inv}

Function File: normal_inv (x, m, v)

For each element of x, compute the quantile (the inverse of the CDF) at x of the normal distribution with mean m and variance v.

Default values are m = 0, v = 1.

@anchor{doc-normal_pdf}

Function File: normal_pdf (x, m, v)

For each element of x, compute the probability density function (PDF) at x of the normal distribution with mean m and variance v.

Default values are m = 0, v = 1.

@anchor{doc-normal_rnd}

Function File: normal_rnd (m, v, r, c)

Return an r by c matrix of random samples from the normal distribution with parameters m and v. Both m and v must be scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the common size of m and v.

@anchor{doc-pascal_cdf}

Function File: pascal_cdf (x, n, p)

For each element of x, compute the CDF at x of the Pascal (negative binomial) distribution with parameters n and p.

The number of failures in a Bernoulli experiment with success probability p before the n-th success follows this distribution.

@anchor{doc-pascal_inv}

Function File: pascal_inv (x, n, p)

For each element of x, compute the quantile at x of the Pascal (negative binomial) distribution with parameters n and p.

The number of failures in a Bernoulli experiment with success probability p before the n-th success follows this distribution.

@anchor{doc-pascal_pdf}

Function File: pascal_pdf (x, n, p)

For each element of x, compute the probability density function (PDF) at x of the Pascal (negative binomial) distribution with parameters n and p.

The number of failures in a Bernoulli experiment with success probability p before the n-th success follows this distribution.

@anchor{doc-pascal_rnd}

Function File: pascal_rnd (n, p, r, c)

Return an r by c matrix of random samples from the Pascal (negative binomial) distribution with parameters n and p. Both n and p must be scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the common size of n and p.

@anchor{doc-poisson_cdf}

Function File: poisson_cdf (x, lambda): For each element of x, compute the cumulative distribution function (CDF) at x of the Poisson distribution with parameter lambda.

@anchor{doc-poisson_inv}

Function File: poisson_inv (x, lambda): For each component of x, compute the quantile (the inverse of the CDF) at x of the Poisson distribution with parameter lambda.

@anchor{doc-poisson_pdf}

Function File: poisson_pdf (x, lambda): For each element of x, compute the probability density function (PDF) at x of the poisson distribution with parameter lambda.

@anchor{doc-poisson_rnd}

Function File: poisson_rnd (lambda, r, c)

Return an r by c matrix of random samples from the Poisson distribution with parameter lambda, which must be a scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the size of lambda.

@anchor{doc-stdnormal_cdf}

Function File: stdnormal_cdf (x): For each component of x, compute the CDF of the standard normal distribution at x.

@anchor{doc-stdnormal_inv}

Function File: stdnormal_inv (x): For each component of x, compute compute the quantile (the inverse of the CDF) at x of the standard normal distribution.

@anchor{doc-stdnormal_pdf}

Function File: stdnormal_pdf (x): For each element of x, compute the probability density function (PDF) of the standard normal distribution at x.

@anchor{doc-stdnormal_rnd}

Function File: stdnormal_rnd (r, c): Return an r by c matrix of random numbers from the standard normal distribution.

@anchor{doc-t_cdf}

Function File: t_cdf (x, n): For each element of x, compute the CDF at x of the t (Student) distribution with n degrees of freedom, i.e., PROB (t(n) <= x).

@anchor{doc-t_inv}

Function File: t_inv (x, n): For each component of x, compute the quantile (the inverse of the CDF) at x of the t (Student) distribution with parameter n.

@anchor{doc-t_pdf}

Function File: t_pdf (x, n): For each element of x, compute the probability density function (PDF) at x of the t (Student) distribution with n degrees of freedom.

@anchor{doc-t_rnd}

Function File: t_rnd (n, r, c)

Return an r by c matrix of random samples from the t (Student) distribution with n degrees of freedom. n must be a scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the size of n.

@anchor{doc-uniform_cdf}

Function File: uniform_cdf (x, a, b)

Return the CDF at x of the uniform distribution on [a, b], i.e., PROB (uniform (a, b) <= x).

Default values are a = 0, b = 1.

@anchor{doc-uniform_inv}

Function File: uniform_inv (x, a, b)

For each element of x, compute the quantile (the inverse of the CDF) at x of the uniform distribution on [a, b].

Default values are a = 0, b = 1.

@anchor{doc-uniform_pdf}

Function File: uniform_pdf (x, a, b)

For each element of x, compute the PDF at x of the uniform distribution on [a, b].

Default values are a = 0, b = 1.

@anchor{doc-uniform_rnd}

Function File: uniform_rnd (a, b, r, c)

Return an r by c matrix of random samples from the uniform distribution on [a, b]. Both a and b must be scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the common size of a and b.

@anchor{doc-weibull_cdf}

Function File: weibull_cdf (x, alpha, sigma)

Compute the cumulative distribution function (CDF) at x of the Weibull distribution with shape parameter alpha and scale parameter sigma, which is

1 - exp(-(x/sigma)^alpha)

for x >= 0.

@anchor{doc-weibull_inv}

Function File: weibull_inv (x, lambda, alpha): Compute the quantile (the inverse of the CDF) at x of the Weibull distribution with shape parameter alpha and scale parameter sigma.

@anchor{doc-weibull_pdf}

Function File: weibull_pdf (x, alpha, sigma)

Compute the probability density function (PDF) at x of the Weibull distribution with shape parameter alpha and scale parameter sigma which is given by

   alpha * sigma^(-alpha) * x^(alpha-1) * exp(-(x/sigma)^alpha)

for x > 0.

@anchor{doc-weibull_rnd}

Function File: weibull_rnd (alpha, sigma, r, c)

Return an r by c matrix of random samples from the Weibull distribution with parameters alpha and sigma which must be scalar or of size r by c.

If r and c are omitted, the size of the result matrix is the common size of alpha and sigma.

@anchor{doc-wiener_rnd}

Function File: wiener_rnd (t, d, n)

Return a simulated realization of the d-dimensional Wiener Process on the interval [0,t]. If d is omitted, d = 1 is used. The first column of the return matrix contains time, the remaining columns contain the Wiener process.

The optional parameter n gives the number of summands used for simulating the process over an interval of length 1. If n is omitted, n = 1000 is used.

Go to the first, previous, next, last section, table of contents.