I hope that someday Octave will include more statistics functions. If you would like to help improve Octave in this area, please contact bug-octave@bevo.che.wisc.edu.
@anchor{doc-mean}
mean (x) = SUM_i x(i) / N
If x is a matrix, compute the mean for each column and return them in a row vector.
With the optional argument opt, the kind of mean computed can be selected. The following options are recognized:
"a"
"g"
"h"
If the optional argument dim is supplied, work along dimension dim.
Both dim and opt are optional. If both are supplied, either may appear first.
@anchor{doc-median}
x(ceil(N/2)), N odd median(x) = (x(N/2) + x((N/2)+1))/2, N even
If x is a matrix, compute the median value for each column and return them in a row vector.
@anchor{doc-std}
std (x) = sqrt (sumsq (x - mean (x)) / (n - 1))
If x is a matrix, compute the standard deviation for each column and return them in a row vector.
@anchor{doc-cov}
cov (x, y)
is the covariance between the i-th
variable in x and the j-th variable in y. If called
with one argument, compute cov (x, x)
.
@anchor{doc-corrcoef}
corrcoef (x, y)
is the correlation between the
i-th variable in x and the j-th variable in y.
If called with one argument, compute corrcoef (x, x)
.
@anchor{doc-kurtosis}
kurtosis (x) = N^(-1) std(x)^(-4) sum ((x - mean(x)).^4) - 3
of x. If x is a matrix, return the row vector containing the kurtosis of each column.
@anchor{doc-mahalanobis}
@anchor{doc-skewness}
skewness (x) = N^(-1) std(x)^(-3) sum ((x - mean(x)).^3)
of x. If x is a matrix, return the row vector containing the skewness of each column.
@anchor{doc-values}
@anchor{doc-var}
@anchor{doc-table}
Currently, only 1- and 2-dimensional tables are supported.
@anchor{doc-studentize}
If x is a matrix, do the above for each column.
@anchor{doc-statistics}
If x is a vector, treat it as a column vector.
@anchor{doc-spearman}
For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.
spearman (x)
is equivalent to spearman (x,
x)
.
For two data vectors x and y, Spearman's rho is the correlation of the ranks of x and y.
If x and y are drawn from independent distributions,
rho has zero mean and variance 1 / (n - 1)
, and is
asymptotically normally distributed.
@anchor{doc-run_count}
@anchor{doc-ranks}
If x is a matrix, do the above for each column of x.
@anchor{doc-range}
If x is a matrix, do the above for each column of x.
@anchor{doc-qqplot}
If F is the CDF of the distribution dist with parameters params and G its inverse, and x a sample vector of length n, the QQ-plot graphs ordinate s(i) = i-th largest element of x versus abscissa q(if) = G((i - 0.5)/n).
If the sample comes from F except for a transformation of location and scale, the pairs will approximately follow a straight line.
The default for dist is the standard normal distribution. The optional argument params contains a list of parameters of dist. For example, for a quantile plot of the uniform distribution on [2,4] and x, use
qqplot (x, "uniform", 2, 4)
If no output arguments are given, the data are plotted directly.
@anchor{doc-probit}
@anchor{doc-ppplot}
If F is the CDF of the distribution dist with parameters params and x a sample vector of length n, the PP-plot graphs ordinate y(i) = F (i-th largest element of x) versus abscissa p(i) = (i - 0.5)/n. If the sample comes from F, the pairs will approximately follow a straight line.
The default for dist is the standard normal distribution. The optional argument params contains a list of parameters of dist. For example, for a probability plot of the uniform distribution on [2,4] and x, use
ppplot (x, "uniform", 2, 4)
If no output arguments are given, the data are plotted directly.
@anchor{doc-moment}
If x is a matrix, return the row vector containing the p-th moment of each column.
With the optional string opt, the kind of moment to be computed can
be specified. If opt contains "c"
or "a"
, central
and/or absolute moments are returned. For example,
moment (x, 3, "ac")
computes the third central absolute moment of x.
@anchor{doc-meansq}
@anchor{doc-logit}
log (p /
(1-p))
of p.
@anchor{doc-kendall}
For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.
kendall (x)
is equivalent to kendall (x,
x)
.
For two data vectors x, y of common length n, Kendall's tau is the correlation of the signs of all rank differences of x and y; i.e., if both x and y have distinct entries, then
1 tau = ------- SUM sign (q(i) - q(j)) * sign (r(i) - r(j)) n (n-1) i,j
in which the q(i) and r(i) are the ranks of x and y, respectively.
If x and y are drawn from independent distributions,
Kendall's tau is asymptotically normal with mean 0 and variance
(2 * (2n+5)) / (9 * n * (n-1))
.
@anchor{doc-iqr}
If x is a matrix, do the above for each column of x.
@anchor{doc-cut}
If breaks is a scalar, the data is cut into that many
equal-width intervals. If breaks is a vector of break points,
the category has length (breaks) - 1
groups.
The returned value is a vector of the same size as x telling
which group each point in x belongs to. Groups are labelled
from 1 to the number of groups; points outside the range of
breaks are labelled by NaN
.
@anchor{doc-cor}
cor (x, y)
is
the correlation between the i-th variable in x and the
j-th variable in y.
For matrices, each row is an observation and each column a variable; vectors are always observations and may be row or column vectors.
cor (x)
is equivalent to cor (x, x)
.
@anchor{doc-cloglog}
- log (- log (x))
@anchor{doc-center}
@anchor{doc-anova}
Data may be given in a single vector y with groups specified by a corresponding vector of group labels g (e.g., numbers from 1 to k). This is the general form which does not impose any restriction on the number of data in each group or the group labels.
If y is a matrix and g is omitted, each column of y is treated as a group. This form is only appropriate for balanced ANOVA in which the numbers of samples from each group are all equal.
Under the null of constant means, the statistic f follows an F distribution with df_b and df_w degrees of freedom.
The p-value (1 minus the CDF of this distribution at f) is returned in pval.
If no output argument is given, the standard one-way ANOVA table is printed.
@anchor{doc-bartlett_test}
Under the null of equal variances, the test statistic chisq approximately ollows a chi-square distribution with df degrees of freedom.
The p-value (1 minus the CDF of this distribution at chisq) is returned in pval.
If no output argument is given, the p-value is displayed.
@anchor{doc-chisquare_test_homogeneity}
For large samples, the test statistic chisq approximately follows a
chisquare distribution with df = length (c)
degrees of freedom.
The p-value (1 minus the CDF of this distribution at chisq) is returned in pval.
If no output argument is given, the p-value is displayed.
@anchor{doc-chisquare_test_independence}
The p-value (1 minus the CDF of this distribution at chisq) of the test is returned in pval.
If no output argument is given, the p-value is displayed.
@anchor{doc-cor_test}
The optional argument string alt describes the alternative
hypothesis, and can be "!="
or "<>"
(non-zero),
">"
(greater than 0), or "<"
(less than 0). The
default is the two-sided case.
The optional argument string method specifies on which
correlation coefficient the test should be based. If method is
"pearson"
(default), the (usual) Pearson's product moment
correlation coefficient is used. In this case, the data should come
from a bivariate normal distribution. Otherwise, the other two
methods offer nonparametric alternatives. If method is
"kendall"
, then Kendall's rank correlation tau is used. If
method is "spearman"
, then Spearman's rank correlation
rho is used. Only the first character is necessary.
The output is a structure with the following elements:
If no output argument is given, the p-value is displayed.
@anchor{doc-f_test_regression}
Under the null, the test statistic f follows an F distribution with df_num and df_den degrees of freedom.
The p-value (1 minus the CDF of this distribution at f) is returned in pval.
If not given explicitly, r = 0.
If no output argument is given, the p-value is displayed.
@anchor{doc-hotelling_test}
mean
(x) == m
.
Hotelling's T^2 is returned in tsq. Under the null, @math{(n-p) T^2 / (p(n-1))} has an F distribution with @math{p} and @math{n-p} degrees of freedom, where @math{n} and @math{p} are the numbers of samples and variables, respectively.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
@anchor{doc-hotelling_test_2}
mean
(x) == mean (y)
.
Hotelling's two-sample T^2 is returned in tsq. Under the null,
(n_x+n_y-p-1) T^2 / (p(n_x+n_y-2))
has an F distribution with @math{p} and @math{n_x+n_y-p-1} degrees of freedom, where @math{n_x} and @math{n_y} are the sample sizes and @math{p} is the number of variables.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
@anchor{doc-kolmogorov_smirnov_test}
The optional argument params contains a list of parameters of dist. For example, to test whether a sample x comes from a uniform distribution on [2,4], use
kolmogorov_smirnov_test(x, "uniform", 2, 4)
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative F
!= G. In this case, the test statistic ks follows a two-sided
Kolmogorov-Smirnov distribution. If alt is ">"
, the
one-sided alternative F > G is considered. Similarly for "<"
,
the one-sided alternative F > G is considered. In this case, the
test statistic ks has a one-sided Kolmogorov-Smirnov
distribution. The default is the two-sided case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value is displayed.
@anchor{doc-kolmogorov_smirnov_test_2}
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative F
!= G. In this case, the test statistic ks follows a two-sided
Kolmogorov-Smirnov distribution. If alt is ">"
, the
one-sided alternative F > G is considered. Similarly for "<"
,
the one-sided alternative F < G is considered. In this case, the
test statistic ks has a one-sided Kolmogorov-Smirnov
distribution. The default is the two-sided case.
The p-value of the test is returned in pval.
The third returned value, d, is the test statistic, the maximum vertical distance between the two cumulative distribution functions.
If no output argument is given, the p-value is displayed.
@anchor{doc-kruskal_wallis_test}
Suppose a variable is observed for k > 1 different groups, and let x1, ..., xk be the corresponding data vectors.
Under the null hypothesis that the ranks in the pooled sample are not affected by the group memberships, the test statistic k is approximately chi-square with df = k - 1 degrees of freedom.
The p-value (1 minus the CDF of this distribution at k) is returned in pval.
If no output argument is given, the p-value is displayed.
@anchor{doc-manova}
The data matrix is given by y. As usual, rows are observations and columns are variables. The vector g specifies the corresponding group labels (e.g., numbers from 1 to k).
The LR test statistic (Wilks' Lambda) and approximate p-values are computed and displayed.
@anchor{doc-mcnemar_test}
Under the null, chisq is approximately distributed as chisquare with df degrees of freedom.
The p-value (1 minus the CDF of this distribution at chisq) is returned in pval.
If no output argument is given, the p-value of the test is displayed.
@anchor{doc-prop_test_2}
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
p1 != p2. If alt is ">"
, the one-sided
alternative p1 > p2 is used. Similarly for "<"
,
the one-sided alternative p1 < p2 is used.
The default is the two-sided case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
@anchor{doc-run_test}
The p-value of the test is returned in pval.
If no output argument is given, the p-value is displayed.
@anchor{doc-sign_test}
n = sum
(x != y)
and p = 1/2.
With the optional argument alt
, the alternative of interest
can be selected. If alt is "!="
or "<>"
, the
null hypothesis is tested against the two-sided alternative PROB
(x < y) != 1/2. If alt is ">"
, the
one-sided alternative PROB (x > y) > 1/2 ("x is
stochastically greater than y") is considered. Similarly for
"<"
, the one-sided alternative PROB (x > y) < 1/2
("x is stochastically less than y") is considered. The default is
the two-sided case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
@anchor{doc-t_test}
mean
(x) == m
. Under the null, the test statistic t
follows a Student distribution with df = length (x)
- 1
degrees of freedom.
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
mean (x) != m
. If alt is ">"
, the
one-sided alternative mean (x) > m
is considered.
Similarly for "<", the one-sided alternative mean
(x) < m
is considered, The default is the two-sided
case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
@anchor{doc-t_test_2}
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
mean (x) != mean (y)
. If alt is ">"
,
the one-sided alternative mean (x) > mean (y)
is
used. Similarly for "<"
, the one-sided alternative mean
(x) < mean (y)
is used. The default is the two-sided
case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
@anchor{doc-t_test_regression}
rr * b =
r
in a classical normal regression model y =
x * b + e
. Under the null, the test statistic t
follows a t distribution with df degrees of freedom.
If r is omitted, a value of 0 is assumed.
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
rr * b != r
. If alt is ">"
, the
one-sided alternative rr * b > r
is used.
Similarly for "<", the one-sided alternative rr *
b < r
is used. The default is the two-sided case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
@anchor{doc-u_test}
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
PROB (x > y) != 1/2. If alt is ">"
, the
one-sided alternative PROB (x > y) > 1/2 is considered.
Similarly for "<"
, the one-sided alternative PROB (x >
y) < 1/2 is considered, The default is the two-sided case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
@anchor{doc-var_test}
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
var (x) != var (y)
. If alt is ">"
,
the one-sided alternative var (x) > var (y)
is
used. Similarly for "<", the one-sided alternative var
(x) > var (y)
is used. The default is the two-sided
case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
@anchor{doc-welch_test}
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
mean (x) != m
. If alt is ">"
, the
one-sided alternative mean(x) > m is considered. Similarly for
"<"
, the one-sided alternative mean(x) < m is
considered. The default is the two-sided case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
@anchor{doc-wilcoxon_test}
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
PROB (x > y) != 1/2. If alt is ">"
, the one-sided
alternative PROB (x > y) > 1/2 is considered. Similarly
for "<"
, the one-sided alternative PROB (x > y) <
1/2 is considered. The default is the two-sided case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed.
@anchor{doc-z_test}
mean (x) ==
m
for a sample x from a normal distribution with unknown
mean and known variance v. Under the null, the test statistic
z follows a standard normal distribution.
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
mean (x) != m
. If alt is ">"
, the
one-sided alternative mean (x) > m
is considered.
Similarly for "<"
, the one-sided alternative mean
(x) < m
is considered. The default is the two-sided
case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed along with some information.
@anchor{doc-z_test_2}
With the optional argument string alt, the alternative of
interest can be selected. If alt is "!="
or
"<>"
, the null is tested against the two-sided alternative
mean (x) != mean (y)
. If alt is ">"
, the
one-sided alternative mean (x) > mean (y)
is used.
Similarly for "<"
, the one-sided alternative mean
(x) < mean (y)
is used. The default is the two-sided
case.
The p-value of the test is returned in pval.
If no output argument is given, the p-value of the test is displayed along with some information.
@anchor{doc-logistic_regression}
Suppose y takes values in k ordered categories, and let
gamma_i (x)
be the cumulative probability that y
falls in one of the first i categories given the covariate
x. Then
[theta, beta] = logistic_regression (y, x)
fits the model
logit (gamma_i (x)) = theta_i - beta' * x, i = 1, ..., k-1
The number of ordinal categories, k, is taken to be the number
of distinct values of round (y)
. If k equals 2,
y is binary and the model is ordinary logistic regression. The
matrix x is assumed to have full column rank.
Given y only, theta = logistic_regression (y)
fits the model with baseline logit odds only.
The full form is
[theta, beta, dev, dl, d2l, gamma] = logistic_regression (y, x, print, theta, beta)
in which all output arguments and all input arguments except y are optional.
Stting print to 1 requests summary information about the fitted model to be displayed. Setting print to 2 requests information about convergence at each iteration. Other values request no information to be displayed. The input arguments theta and beta give initial estimates for theta and beta.
The returned value dev holds minus twice the log-likelihood.
The returned values dl and d2l are the vector of first and the matrix of second derivatives of the log-likelihood with respect to theta and beta.
p holds estimates for the conditional distribution of y given x.
@anchor{doc-beta_cdf}
@anchor{doc-beta_inv}
@anchor{doc-beta_pdf}
@anchor{doc-beta_rnd}
If r and c are omitted, the size of the result matrix is the common size of a and b.
@anchor{doc-binomial_cdf}
@anchor{doc-binomial_inv}
@anchor{doc-binomial_pdf}
@anchor{doc-binomial_rnd}
If r and c are omitted, the size of the result matrix is the common size of n and p.
@anchor{doc-cauchy_cdf}
@anchor{doc-cauchy_inv}
@anchor{doc-cauchy_pdf}
@anchor{doc-cauchy_rnd}
If r and c are omitted, the size of the result matrix is the common size of lambda and sigma.
@anchor{doc-chisquare_cdf}
@anchor{doc-chisquare_inv}
@anchor{doc-chisquare_pdf}
@anchor{doc-chisquare_rnd}
If r and c are omitted, the size of the result matrix is the size of n.
@anchor{doc-discrete_cdf}
@anchor{doc-discrete_inv}
@anchor{doc-discrete_pdf}
@anchor{doc-discrete_rnd}
Currently, n must be a scalar.
@anchor{doc-empirical_cdf}
@anchor{doc-empirical_inv}
@anchor{doc-empirical_pdf}
@anchor{doc-empirical_rnd}
@anchor{doc-exponential_cdf}
The arguments can be of common size or scalar.
@anchor{doc-exponential_inv}
@anchor{doc-exponential_pdf}
@anchor{doc-exponential_rnd}
If r and c are omitted, the size of the result matrix is the size of lambda.
@anchor{doc-f_cdf}
@anchor{doc-f_inv}
@anchor{doc-f_pdf}
@anchor{doc-f_rnd}
If r and c are omitted, the size of the result matrix is the common size of m and n.
@anchor{doc-gamma_cdf}
@anchor{doc-gamma_inv}
@anchor{doc-gamma_pdf}
@anchor{doc-gamma_rnd}
If r and c are omitted, the size of the result matrix is the common size of a and b.
@anchor{doc-geometric_cdf}
@anchor{doc-geometric_inv}
@anchor{doc-geometric_pdf}
@anchor{doc-geometric_rnd}
If r and c are omitted, the size of the result matrix is the size of p.
@anchor{doc-hypergeometric_cdf}
The parameters m, t, and n must positive integers with m and n not greater than t.
@anchor{doc-hypergeometric_inv}
The parameters m, t, and n must positive integers with m and n not greater than t.
@anchor{doc-hypergeometric_pdf}
The arguments must be of common size or scalar.
@anchor{doc-hypergeometric_rnd}
The parameters m, t, and n must positive integers with m and n not greater than t.
@anchor{doc-kolmogorov_smirnov_cdf}
Inf Q(x) = SUM (-1)^k exp(-2 k^2 x^2) k = -Inf
for x > 0.
The optional parameter tol specifies the precision up to which
the series should be evaluated; the default is tol = eps
.
@anchor{doc-laplace_cdf}
@anchor{doc-laplace_inv}
@anchor{doc-laplace_pdf}
@anchor{doc-laplace_rnd}
@anchor{doc-logistic_cdf}
@anchor{doc-logistic_inv}
@anchor{doc-logistic_pdf}
@anchor{doc-logistic_rnd}
@anchor{doc-lognormal_cdf}
log (a)
and variance v.
Default values are a = 1, v = 1.
@anchor{doc-lognormal_inv}
log (a)
and
variance v.
Default values are a = 1, v = 1.
@anchor{doc-lognormal_pdf}
log (a)
and variance v.
Default values are a = 1, v = 1.
@anchor{doc-lognormal_rnd}
If r and c are omitted, the size of the result matrix is the common size of a and v.
@anchor{doc-normal_cdf}
Default values are m = 0, v = 1.
@anchor{doc-normal_inv}
Default values are m = 0, v = 1.
@anchor{doc-normal_pdf}
Default values are m = 0, v = 1.
@anchor{doc-normal_rnd}
If r and c are omitted, the size of the result matrix is the common size of m and v.
@anchor{doc-pascal_cdf}
The number of failures in a Bernoulli experiment with success probability p before the n-th success follows this distribution.
@anchor{doc-pascal_inv}
The number of failures in a Bernoulli experiment with success probability p before the n-th success follows this distribution.
@anchor{doc-pascal_pdf}
The number of failures in a Bernoulli experiment with success probability p before the n-th success follows this distribution.
@anchor{doc-pascal_rnd}
If r and c are omitted, the size of the result matrix is the common size of n and p.
@anchor{doc-poisson_cdf}
@anchor{doc-poisson_inv}
@anchor{doc-poisson_pdf}
@anchor{doc-poisson_rnd}
If r and c are omitted, the size of the result matrix is the size of lambda.
@anchor{doc-stdnormal_cdf}
@anchor{doc-stdnormal_inv}
@anchor{doc-stdnormal_pdf}
@anchor{doc-stdnormal_rnd}
@anchor{doc-t_cdf}
@anchor{doc-t_inv}
@anchor{doc-t_pdf}
@anchor{doc-t_rnd}
If r and c are omitted, the size of the result matrix is the size of n.
@anchor{doc-uniform_cdf}
Default values are a = 0, b = 1.
@anchor{doc-uniform_inv}
Default values are a = 0, b = 1.
@anchor{doc-uniform_pdf}
Default values are a = 0, b = 1.
@anchor{doc-uniform_rnd}
If r and c are omitted, the size of the result matrix is the common size of a and b.
@anchor{doc-weibull_cdf}
1 - exp(-(x/sigma)^alpha)
for x >= 0.
@anchor{doc-weibull_inv}
@anchor{doc-weibull_pdf}
alpha * sigma^(-alpha) * x^(alpha-1) * exp(-(x/sigma)^alpha)
for x > 0.
@anchor{doc-weibull_rnd}
If r and c are omitted, the size of the result matrix is the common size of alpha and sigma.
@anchor{doc-wiener_rnd}
The optional parameter n gives the number of summands used for simulating the process over an interval of length 1. If n is omitted, n = 1000 is used.
Go to the first, previous, next, last section, table of contents.