Chi-Squared Test in Python

Chi-squared for determining people voting in each category

Your data is in the form of a contingency table. SciPy has the function scipy.stats.chi2_contingency for applying the chi-squared test to a contingency table.

For example,

In [48]: import numpy as np

In [49]: from scipy.stats import chi2_contingency

In [50]: tbl = np.array([[1500, 826, 431], [212, 652, 542]])

In [51]: stat, p, df, expected = chi2_contingency(tbl)

In [52]: stat
Out[52]: 630.0807418107023

In [53]: p
Out[53]: 1.5125346728116583e-137

In [54]: df
Out[54]: 2

In [55]: expected
Out[55]:
array([[1133.79389863, 978.82440548, 644.38169589],
[ 578.20610137, 499.17559452, 328.61830411]])

Chi square test with different sample sizes in Python

You can't do this unless both f_exp and f_obs have the same length. You can achieve your goal by interpolating Y_data2 on the x-axis of Y_data1. You can do it as follows:

from scipy.interpolate import InterpolatedUnivariateSpline 
spl = InterpolatedUnivariateSpline(X_data2, Y_data2)
new_Y_data2 = spl(X_data1)

As both Y_data1 and new_Y_data2 have same lengths now, you can use them in stats.chisquare as follows:

from scipy import stats
stats.chisquare(f_obs=Y_data1, f_exp=new_Y_data2)

Chi-Squared test in Python

scipy.stats.chisquare expects observed and expected absolute frequencies, not ratios. You can obtain what you want with

>>> observed = np.array([20., 20., 0., 0.])
>>> expected = np.array([.25, .25, .25, .25]) * np.sum(observed)
>>> chisquare(observed, expected)
(40.0, 1.065509033425585e-08)

Although in the case that the expected values are uniformly distributed over the classes, you can leave out the computation of the expected values:

>>> chisquare(observed)
(40.0, 1.065509033425585e-08)

The first returned value is the χ² statistic, the second the p-value of the test.



Related Topics



Leave a reply



Submit