Formulas for statistics
T-statistic for correlationst = r / sqrt[(1 - r^2) / (N - 2)], df = N - 2 (from here).
Independent two-sample t-test, general caset = (mean(X1) - mean(X2)) / S,
where S = sqrt(var(X1) / length(X1) + var(X2) / length(X2)).
Degrees of freedom equals (var(X1) / length(X1) + var(X2) / length(X2)) .^ 2 / ((var(X1)/length(X1)) .^ 2 / (length(X1) - 1) + (var(X2)/length(X2)) .^ 2 / (length(X2) - 1)).
F-test for reduction in error varianceSay you have two models, New and Old, of which New adds one or more parameters to Old. These models generate two prediction vectors, pred_new and pred_old, for a vector of data. New will always explain at least as much variance as Old. The F-test tests whether New expplains more variance than would be expected by adding parameters unrelated to the data. Let
F = ((RSS_old - RSS_new) / df1) / (RSS_new / df2);
RSS_old = var(pred_old) * (n - 1);
RSS_new = var(pred_new) * (n - 1);
df1 = p_new - p_old;
df2 = n - p_new;
and n is the number of observations, p_old is the number of parameters (including the constant) of the old model, and p_new the numnber of parameters of the new model. Then F has an F-distribution, and it p-value can be calculated using
x = df1 * F / (df1 * F + df2);
a = df1 / 2;
b = df2 / 2;
p = 1 - betainc(x, a, b);
Matlab function here. Online sources refer to the regularized incomplete beta function, but using this leads to errors in Matlab.
VarianceThe variance is equal to Mean(X.^2) - Mean(X)^2. This means that the variance can be calculated by running through the values of a vector once and adding values to a Sum variable, and the squared values to a SquaredSum variable. From here.
Multiple regressionIf the model of Y is Xb, then b = inv(X' * X) * X' * Y. To remove the effect of a predictor, set its value in b to 0 before reconstructing using Xb. b is such that the combination of predictors explains the greatest possible amount of variance. Whether each predictor's b-value will be reliably estimated depends on correlations between the predictors.
If errors are added to the model, note that the term inv(X' * X) * X' * error will be added to b. So the fluctuations of the estimated b around the ideal b will vary around zero if the error itself varies around zero.
To remove the effect of a covariate on a dependent variable, substract its mean and use it as a predictor together with a ones column for the offset. Reconstruct using the offset plus residuals (i.e. Y - model). Then use this corrected variable in subsequent analyses.