teg_crosstab

[O, Chi2, p, lowvals] = teg_crosstab(x, y) returns the observed crosstab for two vectors x and y, and the results of the Chi squared test. lowvals returns the number of cells with fewer than 5 observations. If more than 10% of the cells have fewer than 5 expected counts, rows and columns are iteratively removed (such that most cells are preserved) until lowvals is under 10%.

function [O, Chi2, p, lowvals] = teg_crosstab(x, y)

ux = unique(x);
uy = unique(y);
N = length(x);
O = [];
E = [];
for ix = 1:length(ux),
for iy = 1:length(uy),
f = find(x == ux(ix) & y == uy(iy));
O(ix, iy) = length(f);
end;
end;
for ix = 1:length(ux),
mx(ix) = sum(O(ix, :));
end;
for iy = 1:length(uy),
my(iy) = sum(O(:, iy));
end;
for ix = 1:length(ux),
for iy = 1:length(uy),
E(ix, iy) = mx(ix) * my(iy) / N;
end;
end;

flow = length(find(E(:) < 5));
lowvals = flow / length(E(:));
while lowvals >= 0.1,
[ir, ic] = find(E < 5);
uir = unique(ir);
uic = unique(ic);
if length(uir) / size(E, 1) <= length(uic) / size(E, 2),
O(ir(1), :) = [];
E(ir(1), :) = [];
else,
O(:, ic(1)) = [];
E(:, ic(1)) = [];
end;
flow = length(find(E(:) < 5));
lowvals = flow / length(E(:));
end;

[nx, ny] = size(E);
if nx * ny == 0,
O = [];
Chi2 = 0;
p = 0.5;
return;
end;

X = ((O - E) .^ 2) ./ E;
Chi2 = sum(X(:));
df = (length(ux) - 1) * (length(uy) - 1);
p = teg_chi2_cdf(Chi2, df);

function p = teg_chi2_cdf(x, k)
p = 1 - gammainc(x / 2, k / 2);