teg_crosstab
[O, Chi2, p, lowvals] = teg_crosstab(x, y) returns the observed crosstab for two vectors x and y, and the results of the Chi squared test. lowvals returns the number of cells with fewer than 5 observations. If more than 10% of the cells have fewer than 5 expected counts, rows and columns are iteratively removed (such that most cells are preserved) until lowvals is under 10%.
function [O, Chi2, p, lowvals] = teg_crosstab(x, y) ux = unique(x); uy = unique(y); N = length(x); O = []; E = []; for ix = 1:length(ux), for iy = 1:length(uy), f = find(x == ux(ix) & y == uy(iy)); O(ix, iy) = length(f); end; end; for ix = 1:length(ux), mx(ix) = sum(O(ix, :)); end; for iy = 1:length(uy), my(iy) = sum(O(:, iy)); end; for ix = 1:length(ux), for iy = 1:length(uy), E(ix, iy) = mx(ix) * my(iy) / N; end; end; flow = length(find(E(:) < 5)); lowvals = flow / length(E(:)); while lowvals >= 0.1, [ir, ic] = find(E < 5); uir = unique(ir); uic = unique(ic); if length(uir) / size(E, 1) <= length(uic) / size(E, 2), O(ir(1), :) = []; E(ir(1), :) = []; else, O(:, ic(1)) = []; E(:, ic(1)) = []; end; flow = length(find(E(:) < 5)); lowvals = flow / length(E(:)); end; [nx, ny] = size(E); if nx * ny == 0, O = []; Chi2 = 0; p = 0.5; return; end; X = ((O - E) .^ 2) ./ E; Chi2 = sum(X(:)); df = (length(ux) - 1) * (length(uy) - 1); p = teg_chi2_cdf(Chi2, df); function p = teg_chi2_cdf(x, k) p = 1 - gammainc(x / 2, k / 2);