Statistical functions¶
This page describes the statistical functions that are available in Phonometrica.
Global functions¶

chi2_test
(X)¶
Computes Pearson’s chisquared (\(\chi^2\)) test on X
, which must be a twodimensional array. The m rows in the array represent
the m levels of a categorical variable, and the n columns represent the n levels of another categorical variable.
Each cell represents the unnormalized frequency count for the combination of the two variables. This test evaluates the
null hypothesis that the two variables are independent.
This function returns an object with the following fields:
chi2
: the \(\chi^2\) valuedf
: the number of degrees of freedomp
: the pvalue
See also: report_chi2()

corr
(x, y)¶
Calculates Pearson’s correlation coefficient between samples x
and y
, which must be onedimensional arrays with the same size.

cov
(x, y)¶
Calculates the covariance between samples x
and y
, which must be onedimensional arrays with the same size.

f_test
(x, y[, alternative])¶
Computes the Ftest on x
and y
which must be onedimensional arrays. This test evaluates the null hypothesis that samples
x
and y
have the same variance.
If alternative
is specified, it must be one of the following strings: "twotailed"
performs a twotailed test (default), "less"
performs a leftailed
test and "greater"
performs a righttailed test.
This function returns an object with the following fields:
f
: the F statistic, which is the ratio between the variance ofx
and the variance ofy
df
: the number of degrees of freedomp
: the pvalue

lm
(y, X)¶
Fits a linear regression model. y
is a set of N observations for a continuous outcome, and X
is an N by M matrix for a model with M regression
coefficients, including the intercept which must be the first coefficient. (In general, it should be a column of 1’s.)
This function returns an object with the following fields:
beta
: an array of estimates for the regression coefficients. The first entry is the interceptse
: an array representing the standard errors of the regression coefficientst
: an array of tvalues for the regression coefficients (t[i]
is the tvalue forbeta[i]
)p
: an array of pvalues for a ttest which evaluates the null hypothesis that each regression coefficient is equal to 0 (p[i]
is the pvalue forbeta[i]
)r2
: the \(R^2\) value, which is the proportion of variance explained by the modeladj_r2
: the adjusted \(R^2\) value, which takes into account the number of predictors in the model.
Note: the model is estimated by minimizing the sum of squared errors. It is fitted analytically using Singular Value Decomposition.

logit
(y, X[, max_iter])¶
Fits a logistic regression model. y
is a set of N binary observations (either 0 or 1), and X
is an N by M matrix for a model with M regression
coefficients, including the intercept which must be the first coefficient. (In general, it should be a column of 1’s.)
If max_iter
is provided, it indicates the maximum number of iterations that the solver should perform to estimate the coefficients (200 by default).
This function returns an object with the following fields:
beta
: an array of estimates for the regression coefficients. The first entry is the interceptse
: an array representing the standard errors of the regression coefficientsz
: an array of zvalues for the regression coefficients (z[i]
is the zvalue forbeta[i]
)p
: an array of pvalues for a Wald test which evaluates the null hypothesis that each regression coefficient is equal to 0 (p[i]
is the pvalue forbeta[i]
)niter
: the number of iterations performed by the numerical solverconverged
: a Boolean value indicating whether the solver has converged to a solution. It istrue
ifniter < max_iter
Note: the model is fitted numerically using the Limitedmemory Broyden–Fletcher–Goldfarb–Shanno (LBFGS) approximation method.

mean
(x[, dim])¶
Returns the mean of the array x
. If dim
is specified, returns an Array
in which each element
represents the mean over the given dimension in a two dimension array. If dim is equal to 1, the calculation is performed
over rows. If it is equal to 2, it is performed over columns.

poisson
(y, X[, robust[, max_iter]])¶
Fits a Poisson regression model. y
is a set of N observations which represent count data (i.e. nonnegative integers), and X
is an N by M matrix for a model with M regression
coefficients, including the intercept which must be the first coefficient. (In general, it should be a column of 1’s.) If robust
is
true
(it is false
by default), Phonometrica will use the socalled “robust variance sandwich estimator” to adjust the standard errors for mild violations of the assumption that the mean is equal to the variance.
If max_iter
is provided, it indicates the maximum number of iterations that the solver should perform to estimate the coefficients (200 by default).
This function returns an object with the following fields:
beta
: an array of estimates for the regression coefficients. The first entry is the interceptse
: an array representing the standard errors of the regression coefficientsz
: an array of zvalues for the regression coefficients (z[i]
is the zvalue forbeta[i]
)p
: an array of pvalues for a Wald test which evaluates the null hypothesis that each regression coefficient is equal to 0 (p[i]
is the pvalue forbeta[i]
)niter
: the number of iterations performed by the numerical solverconverged
: a Boolean value indicating whether the solver has converged to a solution. It istrue
ifniter < max_iter
Note: the model is fitted numerically using the Limitedmemory Broyden–Fletcher–Goldfarb–Shanno (LBFGS) approximation method.

report_chi2
(X)¶
Computes and reports Pearson’s chisquared test on X
, which must be a twodimensional array. This is a convenience wrapper
over chi2_test()
.
See also: chi2_test()

std
(x[, dim])¶
Returns the standard deviation of the array x
. If dim
is specified, returns an Array
in which each element
represents the standard deviation over the given dimension in a two dimension array. If dim is equal to 1, the calculation is performed
over rows. If it is equal to 2, it is performed over columns.

sum
(x[, dim])¶
Returns the sum of the elements in the array x
. If dim
is specified, returns an Array
in which each element
represents the sum over the given dimension in a two dimension array. If dim is equal to 1, the summation is performed
over rows. If it is equal to 2, summation is performed over columns.

t_test
(x, y[, equal_variance[, alternative]])¶
Computes a twosample independent ttest for the mean between the samples x
and y
, which must be onedimensional
arrays. This test evaluates the null hypothesis that samples x
and y
have equal means.
If equal_variance
is true, the variance of the two samples is assumed to be equal and Student’s ttest is calculated,
using the pooled standard error. If equal_variance
is false (default), Welch’s ttest is used instead.
If alternative
is specified, it must be one of the following strings: "twotailed"
performs a twotailed test (default),
"less"
performs a leftailed test and "greater"
performs a righttailed test.
This function returns an object with the following fields:
t
: the t statisticdf1
: the number of degrees of freedom ofx
df2
: the number of degrees of freedom ofy
p
: the pvalue
See also: t_test1()

t_test1
(x, mu[, alternative])¶
 Computes a onesample ttest for the sample
x
, which must be a onedimensional array. This test evaluates the null hypothesis that the mean of sample
x
is equal to the theoretical meanmu
.
If alternative
is specified, it must be one of the following strings: "twotailed"
performs a twotailed test (default),
"less"
performs a leftailed test and "greater"
performs a righttailed test.
This function returns an object with the following fields:
t
: the t statisticdf
: the number of degrees of freedomp
: the pvalue
See also: t_test()

vrc
(x[, dim])¶
Returns the sample variance of the array x
. If dim
is specified, returns an Array
in which each element
represents the variance over the given dimension in a two dimension array. If dim is equal to 1, the calculation is performed
over rows. If it is equal to 2, it is performed over columns.
See also: std()