Clear-Sighted Statistics: An OER Textbook
Chapter 17: Chi-Square Tests
This (chi-square), I believe is the great contribution to statistical methodology which the unsurpassed energy of Professor [Karl] Pearson’s work will be remembered….1
R. A. Fisher
Scientific Methods for Research Workers, 13th Edition
1958
I. Introduction
While R. A. Fisher and Karl Pearson had an acrimonious relationship, Fisher was right about the importance of chi-square (χ2) techniques. Chi-square tests first became popular due to an article written by Karl Pearson in 1900.2 In his honor, chi-square techniques are sometimes called “Pearson’s chi-squared tests.” Over one hundred and twenty years after Pearson first started using these techniques, they are still widely used. Fisher’s muted praise for Pearson here and elsewhere is evidence of Fisher’s lingering animus.3 History, however, has been kinder to Pearson than Fisher’s lukewarm praise, made 22 years after Pearson’s death, would suggest. In this chapter we will introduce chi-square tests. In Chapter18, we shall examine another of Pearson’s major contributions to statistical methods: The coefficient of correlation or Pearson’s r, which measures the strength of a linear correlation.
Chi-square is the most commonly used nonparametric technique. Nonparametric techniques makes no assumptions about how the population is distributed. So far we have only dealt with parametric statistics: Techniques that include population parameters like the mean (μ), variance (σ2), standard deviation (σ), and proportion (π). With parametric distributions, we assume that the data fit a normal or symmetrical distribution like the one shown in Figure 1 and that the unknown population parameters can be estimated using sample statistics.
Figure 1: The Normal Distribution
Nonparametric tests refers to statistical techniques that do not require that the data fit a normal distribution. With nonparametric tests, we can use qualitative data—nominal and ordinal data—and frequency counts or proportions, which are quantitative.4 Chi-square tests focus on the relationships among the nominal or ordinal classifications of categories or treatments, not interval or ratio measures like mean, proportion, variance, standard deviation, or the coefficient of correlation.
Chi-square stands for the Greek letter χ, and is pronounced “kai.” It has nothing to do with the Chinese principle of ch’i, which is pronounced “chee,” and stands for the vital energy of any living being. It is also not related to the Hebrew word chai, which is pronounced as “hi.” This word has symbolic and numerical meaning. It means “life,” but more deeply connotes a life dedicated to kindness, thoughtfulness, and selflessness. It also signifies the number 18.
After completing this chapter, you will be able to:
• Understand the characteristics and requirements of chi-square tests.
• Conduct a chi-square goodness of fit test when the expected frequencies are equal.
• Conduct a chi-square goodness of fit test when the expected frequencies are not equal.
• Conduct a chi-square test for data organized into contingency tables (Also known as a χ2 test of independence).
• Conduct a chi-square goodness of fit test of normality.
• Calculate Effect Size using Phi, φ, which is the simplest technique for determining chi-square effect size.
• Determine the a priori statistical power of a chi-square test using G*Power and the Statistics Kingdom calculators.
You should download the following files for this chapter:
• The chi-square critical values table in Excel and pdf formats:
- ChiSq_Distribution.xlsx
- ChiSq_Distribution.pdf
• The chi-square p-values calculator: Chapter17_ChiSq_pvalue_Calculator.xlsx.
• Data and answers for the chapter examples: Chapter17_ChiSq_Examples.xlsx.
• Data for the end-of-chapter exercises: Chapter17_ChiSq_Exercises.xlsx.
II. Characteristics of the Chi-Square Distribution
The chi-square distribution has five major characteristics:
1. Chi-Square is always a non-negative number. This is because chi-square values are computed using the squared differences between observed frequencies, O, and expected frequencies, E. Whenever you square a negative or positive number, the result is always a non-negative number. The smallest possible value for chi-square, therefore, is zero, which indicates that the observed and expected frequencies are identical. As the value of chi-square increases the differences between the observed and expected frequencies increase.
2. Chi-Square distributions are defined by degrees of freedom. How degrees of freedom are calculated depends on the type of chi-square test. As illustrated by Figure 2, which shows the chi-square distribution curves with 1, 4, 8, and 12 degrees of freedom. As the number of degrees of freedom increases, the distribution moves closer to a symmetrical or normal distribution.
Figure 2: Chi-Square Distributions with 1, 4, 8, and 12 Degrees of Freedom
3. Chi-square distributions are always positively skewed (right skewed). Chisquare tests, therefore, will always be right-tailed tests. This is because Chisquare must be a non-negative number (zero or a positive number),
4. Like ANOVA, chi-square tests are omnibus or global tests. When there are more than two treatments or categories, the test cannot determine which pairs differ.
5. Chi-square tests have three requirements:
A. The categories or treatments are independent and mutually exclusive.
B. The data for the observed frequencies are the result of counts of the qualitative data, which will become evident in the examples presented below.
C. Chi-square is sensitive to low sample size and low category frequencies. The expected frequencies for each category must be at least five. When this requirement cannot be met, we may be able to combine categories. Of course there would have to be more than two categories to do this. It would not be proper, therefore, to conduct a chi-square test with only two categories when the Expected Frequency for one category is 5 or less.
Table 1: Expected Frequency Must Be Greater than 5.
This data should not be used for a chi-square test.
III. Types of Chi-Square (χ2) Tests
In this chapter we will cover three broad categories of chi-square tests.
1. Chi-Square Goodness of Fit Test
This test is sometimes called a one-way chi-square test because it is applied to one categorical (nominal or interval level) variable. This test calculates how closely the observed data fit the expected distribution of the data. The null hypothesis states that the observed distribution “fits,” or matches the expected distribution. The alternate hypothesis declares that the observed distribution does not fit with the expected distribution; or, there is a statistically significant difference between the observed and expected frequencies. Please Note: The null and alternate hypotheses are not written with mathematical symbols. Degrees of freedom for chi-square goodness of fit tests are defined as the number of categories, k, minus one, k - 1.
Equation 1: Degrees of Freedom for a Chi-Square Goodness of Fit Test
Degrees of Freedom = k – 1
Where: k is the number of categories
2. Chi-Square Contingency Table Tests
These tests are sometimes called a chi-square test of independence, or a two-way chi-square test, because we test multiple qualitative variables. The null hypothesis for this test declares that the observed distribution matches the expected distribution or the categories are independent. The alternate hypothesis states that the observed distribution does not match the expected distribution or the categories are dependent. As with the chi-square goodness of fit test, the null and alternate hypotheses are not written with mathematical symbols. Degrees of freedom for these tests are found by multiplying the number of rows in the contingency table minus 1 by the number of columns minus 1:
Equation 2: Degrees of Freedom for A Chi-Square Contingency Table Test
Degrees of Freedom = (r – 1)(c -1)
Where: r stands for the number of rows
c stands for the number of columns
3. Chi-Square Test for Normality
We will also cover an extension of the chi-square goodness of fit test, the chi-square test for normality. This test allows us to determine whether the data follow a normal distribution. The null hypothesis states that the data are normally distributed while the alternate hypothesis states that the data are not normally distributed. As with other chi-square tests, the null and alternate hypothesis are not written with mathematical symbols. When we reject the null hypothesis, we conclude that the data are not normally distributed and that we should not use parametric methods to analyze the data.
With this test the expected frequencies are calculated based on the mean and standard deviation of sampled data and how the data fit with a normal distribution. This test, however, is not considered as precise as other tests for normality like the Shapiro-Wilk or Kolmogorov-Smirnov tests, which are not typically covered in an introductory statistics class. Dedicated statistical software like SPSS can perform these tests in seconds with a few clicks of the mouse.
With chi-square tests for normality, chi-square degrees of freedom depend on whether we are using population parameters (μ and σ) or sample statistics (Xbar and s). If we are using population parameters, degrees of freedom are defined as k – 1. But when we use sample statistics, degrees of freedom are defined as k – 1 – 2. In essence, we lose two degrees of freedom when we estimate μ and σ using Xbar and s.
Equation 3: Degrees of Freedom for Chi-Square Tests of Normality
Degrees of freedom with population parameters = k – 1
Degrees of freedom with sample statistics = k – 1 – 2
Where: k is the number of categories
All chi-square tests use the following formula for calculating the value of the test statistic:
Equation 4: Formula for the Chi-Square Test Statistic
Where: O are the observed frequencies and
E are the expected frequencies
With all chi-square techniques, the observed frequencies are determined by counting the variables in the data. The techniques used to calculate the expected frequencies depend on the type of chi-square test.
IV. The Chi-Square (χ2) Goodness of fit Test
The chi-square goodness of fit test compares the distribution of observed frequencies to the distribution of expected frequencies to determine how closely observations match or “fit” expectations.
The null hypothesis for this test states that there is no statistically significant difference between the observed frequencies and the expected frequencies. The alternate hypothesis states that there is a statistically significant difference between the observed and expected frequencies. As previously stated, both the null and alternate hypothesis are written in a short sentence without mathematical notation.
The critical value of chi-square is found using a critical values table, Microsoft Excel, or statistical software. Figure 3 shows a section of the critical values table for chi-square. The left-most column shows the degrees of freedom. There are five columns for the level of significance, α: 0.10, 0.05, 0.02, 0.01, and 0.001. The critical value for chi-square is found at the intersection of the row for degrees of freedom and the column for significance level. For example, if we were using a 0.05 significance level with 5 degrees of freedom, the critical value would be 11.070. For a test using a 0.01 significance level, the critical value would be 15.086. Please note: The critical value is always expressed with three digits past the decimal point. This is the thousandths column.
Figure 3: Abbreviated Chi-Square Critical Values Table
Microsoft Excel can also find the critical value of chi-square using the CHISQ.INV.RT function as shown in Equation 5.
Equation 5: Excel’s Function for Finding the Critical Value for Chi-Square
=CHISQ.INV.RT(α,df)
Where: α is the significance level and
df are the degrees of freedom
The decision rule for the example cited above with α set at 5 percent with 5 degrees of freedom is: Reject the null hypothesis if chi-square is greater than 11.070. You can estimate the p-value using the chi-square critical value table in the same manner as the Student-t table can be used to estimate p-values. As shown below in Table 2, if the value of chi-square were 10.052, the p-value would be greater than 5 percent and less than 10 percent, and we would fail to reject the null hypothesis.
Table 2: Estimating the p-value using the Chi-Square Table
Excel can calculate a precise p-value using the function shown in Equation 6:
Equation 6: Excel’s Function for Finding the p-Value of Chi-Square
=CHISQ.DIST.RT(chi-square,df)
Where: Chi-square is the calculated test statistic and
df are the degrees of freedom
The exact p-value for a chi-square of 10.052 with 5 degrees of freedom is 0.0738 or 7.38 percent. See table 3.
Table 3: p-Value Calculation in Excel
A. Example 1: Chi-Square Goodness of Fit Test with Equal Expected Frequencies
Your friend loves to bake cookies and she is very good at it. So good, in fact, people tell her that they want to buy her cookies. She has decided to start selling her cookies at street fairs and farmer’s markets. She intends to sell cookies made from her six favorite recipes: Almond Sugar, Brown Sugar Shortbread, Chocolate Chip, Gingerbread, Oatmeal, and Peanut Butter.
Before her first street fair, she asks for your help. She wants you to determine whether her sales are equally distributed among the six recipes. You agree and tell her she needs this information to help her determine how many of each type of cookie to bake for future street fairs.
# 1: Test Set-Up
You agree to conduct a chi-square goodness of fit test to compare actual or observed sales with expected sales. This test will tell you whether the actual sales match the expected sales, or if there is a statistically significant difference between the observed and the expected sales.
The first step at this stage is to run an a priori statistical power analysis. To do this we need to estimate the effect size. We will measure effect size using Phi, φ. Phi is the square root of the chi-square test statistic over the total number of observations, n:
Equation 7: Effect Size
Table 4 shows the thresholds for interpreting the effect size (ES).5
Table 4: Effect Size Thresholds
Given that you have no data to calculate the effect size, you must estimate it. You do not expect that the cookies will have equal sales. You think that the effect will be moderate and guess the effect size will be 0.2675. You can now run the a priori statistical power calculations.
G*Power: A Priori Power Calculation
To run this analysis in G*Power, we select “χ2 tests” from the family of goodness of fit tests: Contingency Tables from the Statistical test box, and “A priori: required sample…” from the type of power analysis. There are four inputs:
1. Effect size: Enter 0.2675.
2. α, err probability, or the level of significance, set at 0.05.
3. Power (1- β err prob): Enter 0.80.
4. Degrees of freedom defined as the number of categories minus one. Degrees of freedom is 5, found by 6 minus 1.
G*Power provides four outputs:
1. Noncentrality parameter λ: 12.8801250.
2. The critical value for chi-square, 11.07004977, which rounds off to 11.070.
2. Total sample size: 180.
3. Actual power: 0.8018733.
Figure 4 shows the results of the G*Power a priori power calculation. A sample of 180 is needed to achieve 80 percent power.
Figure 4: G*Power – A Priori Power Calculation
Statistics Kingdom: A Priori Power Calculation
The Statistics Kingdom a priori power calculation has seven inputs:
1. Test: There are two options: Goodness of fit and Test for variance. Select Goodness of fit.
2. Digits: Set at six. This option does not always change how the output of this calculation rounded.
3. Significance level (α): Set ay 0.05.
4. Power: Set at 0.8.
5. Effect: There are tree options: Small, Medium, and Large. Set at Medium.
6. Effect size (w): Set at 0.2675.
7. Categories: Set at 6.
The Statistics Kingdom calculator also reports that the sample size needed to achieve 80 percent power is 180.
Figure 5: Statistics Kingdom – A Priori Power Calculation
After the sale, your friend gives you her sales data. Table 5 shows how many cookies she sold broken down by recipe. By coincidence, her total sales are 180 cookies. We should, therefore, have sufficient statistical power assuming your effect size estimate is reasonably accurate.
Table 5: Cookies Sold – Observed Frequencies
The next question: What are the expected sales per cookie recipe? After all, your friend has never before sold her cookies. You have no experience to guide you in estimating the sales for each recipe. The solution is to assume that sales for each type of cookie will be equal. You can take your friend’s total cookie sales and divide by the number of types of cookies to get the expected frequencies:
Equation 8: Expected Frequency Calculation Assuming Equal Sales
E = 180 cookies sold/6 cookie recipes = 30 cookies per recipe
You, therefore, set the expected frequencies for all six cookies at 30.
# 2: Select the Significance Level, α
You decide to conduct this test at a 5 percent significance level. The critical value of chi-square with 5 degrees of freedom is 11.070.
Table 6: Critical Value of Chi-Square Using a Table
Here is how to find the critical value using Excel’s CHISQ.RT.INV function.
Figure 6: Critical Value of Chi-Square Using Excel
# 3: State the Null and Alternate Hypotheses
• H0: The six cookie recipes have equal sales
• H1: The six cookie recipes have significantly different sales
Please Note: The null and alternate hypotheses are written in simple sentences; there are not stated using mathematical formulas. The null hypothesis states that there is no difference; the expected frequencies do not match the observed frequencies. The alternate hypothesis states that there is a difference, the expected and observed frequencies do not match.
# 4: Compose the Decision Rule
The decision rule: Reject the null hypothesis if chi-square is greater than 11.070. Figure 7 shows a chart of the chi-square distribution for the 5 percent rejection region starting at 11.070. The area in black is the rejection region.
Figure 7: Chi-Square Distribution with 5 df and a 5% α, χ 25,0.05
# 5: Calculate the Test Statistic and p-Value
Excel does not have a built-in chi-square goodness of fit function. The closest function Excel has is the CHISQ.TEST:
Equation 9: Excel’s CHISQ.TEST Function
=CHISQ.TEST(observed_range,expected_range)
This function returns the p-value for a contingency table test. Fortunately, we can set up a worksheet to calculate the chi-square value, the p-value, and effect size measured with Phi for a goodness of fit test. Table 7 shows the chi-square goodness of fit calculations performed in Excel. This worksheet uses the standard chi-square formula. Table 7 shows the chi-square goodness of fit calculations performed in Excel. This worksheet uses the standard chi-square formula.
Table 7: Chi-Square Calculation Performed in Excel
The value of the chi-square test statistic equals 14.867, found by using the standard formula for chi-square.
Equation 10: Chi-Square Test Statistic Formula
Σ[(O – E)2 / E]
The p-value was calculated using Excel’s CHISQ.DIST.RT function.
Equation 11: Calculated the p-value Using Excel’s CHISQ.DIST.RT Function
=CHISQ.DIST.RT(ChiSquare,df)
The p-value equals 0.0109, or 1.09 percent. Figure 8 shows a chart with the rejection region in black and the p-value in red.
Figure 8: Chi-Square Distribution with a p-Value of 1.09%
We will measure Effect Size using Phi, φ. Phi is the square root of the chi-square test statistic over the total number of observations, n:
Equation 12: Effect Size
The effect size of 0.2874 indicates that the type of cookie has a moderate effect on cookie sales. We can interpret this effect size as meaning 28.74% of sales are associated with the cookie type. Table 8 shows the thresholds for interpreting the effect size (ES).6
Table 8: Effect Size Thresholds
# 6: Decide and Report
You will reject the null hypothesis given the fact that the chi-square value of 14.867 is above the critical value of 11.070 and in the rejection region. The decision to reject the null hypothesis can also be made using the p-value of 1.09 percent, which is below the 5 percent significance level. There is a statistically significant difference between the observed sales and the expected sales. Figure 9 shows a chart of the chi-square distribution with the rejection region in black and the p-value in red.
Figure 9: Chi-Square Distribution Chart
Conclusion: The different cookie recipes do not sell equally. Please note: At a 1 percent significance level, we would fail to reject the null hypothesis because the p-value would be greater than the significance level and the value of the test statistic, 14.867 is less than the critical value of the 1 percent significance level, 15.086.
Like ANOVA tests, chi-square goodness of fit tests are omnibus tests. We do not know which of the 15 different two-cookie pairs of cookies have different sales. To answer this question, we would have to conduct a post hoc analysis. There are several post hoc analyses for chi-square, but we will skip them because they take us beyond the scope of an introductory statistics class.
The six types of cookies do not sell equally. This test also has practical significance because it provides important information that can help the baker manage inventory. The two most popular cookies—Chocolate Chip and Peanut Butter—sell nearly twice as many cookies as Brown Sugar Shortbread and Almond Sugar. This information can help the baker plan on the number of cookies needed for future events. Without proper inventory management, the baker could run out of the most popular cookies while being left with too many unsold cookies of the less popular recipes.
B. Example 2: Chi-Square Goodness of Fit Test with Unequal Expected Frequencies
One week after you completed this analysis, your friend again calls you for help. Over the weekend she sold her cookies at another street fair. She asks if you can repeat your analysis. You agree, but you tell her that this time it will be slightly different because we do not expect each type of cookie to sell equally. She asks, “How do we determine the expected sales for each cookie?” Here is the answer: “We take the relative frequency of the sales per cookie for week 1, and use those proportions to estimate the expected sales for each of the six cookie types. Table 9 shows the relative frequencies for each cookie:
Table 9: Week 1 Sales and Relative Frequencies
You tell your friend that you will find the expected frequencies by multiplying the relative frequencies by the total sales for all six cookie types, which are the observed frequencies for the second sales event.
You also tell your friend that you will have to perform another a priori statistical power calculation because you hope that with the new distribution of cookie sales the effect size should be lower. In fact, if you got the correct distribution of cookie sales, we should not reject the null hypothesis. You now project a weak effect of 0.19.
Table 10: Effect Size Thresholds
The G*Power a priori statistical power calculation shows that you need a sample of 356 cookie sales to achieve 80 percent power:
Figure 10: G*Power A Priori Statistical Power Calculation
# 1: Test Set-Up
Your friend gives you a breakout of sales by cookie for the second street fair. See Table 11:
Table 11: Week 2 Sales per Cookie
You will then use the relative frequencies from Week 1 to calculate the expected frequencies for Week 2 by multiplying the relative frequencies by total cookie sales. See Table 12:
Table 12: Expected Frequency Calculations for Week 2
*Please note: The expected frequencies for this example have been rounded off to a whole number.
# 2: Select the Significance Level, α
Like last week, you select a 0.05 significance level. Given that the degrees of freedom is still 5, found by k – 1, and the level of significance, the critical value for chi-square remains 11.070.
# 3: State the Null and Alternate Hypotheses
We revise the null and alternate hypothesis because we now know that each of the cookie recipes do not have equal sales. Our research question now is: Do the observed sales match our expected sales? Here are the null and alternate hypotheses:
• H0: The observed sales per cookie match expected sales
• H1: The observed sales per cookie do not match expected sales
# 4: Compose the Decision Rule
The decision rule does not change: Reject the null hypothesis if chi-square is greater than 11.070.
Figure 11: df = 5, 5% Significance Level
# 5: Calculate the Test Statistic and p-Value
You are now ready to conduct the chi-square goodness of fit test for unequal expected frequencies. Figure 11 shows the complete test results with the p-value:
Figure 12: Chi-Square Goodness of Fit Test Calculations
The value of chi-square is 11.042, just below the critical value of 11.070. The p-value is 0.0505, just above the significance level of 0.05. Figure 13 shows a chart of the chi-square distribution with a 5.05 percent p-value.
Figure 13: Chi-Square Distribution with a p-Value of 1.09%
Effect size fell from 0.2874 in the previous example to 0.1751. This is a weak effect that suggests that the recipe counts for 17.51 percent of the cookie sales.
Equation 13: Effect Size
# 6: Decide and Report
We fail to reject the null hypothesis based on the chi-square value and p-value. We have insufficient evidence to conclude that the observed sales do not match the expected sales. The difference between the observed and expected frequencies is most likely random sampling error.
While this test lacks statistical significance, it has practical significance. The information uncovered provides the baker with useful information for managing her inventory. She knows her cookies do not sell equally and has an estimate for each recipe’s portion of total sales. The data suggest that she should prepare more chocolate chip and peanut butter cookies than almond sugar, gingerbread, or brown sugar shortbread cookies. Given that the null hypothesis was nearly rejected, the baker should continue to monitor the observed and expected sales of her cookies.
Based on the low statistical power, the baker should continue collecting data so she can learn more about the distribution of her cookie sales. Her goal should be to actually reduce the effect size that the cookie recipes have on sales, which would mean that her expected and observed distribution would be closer to a perfect match. A perfect match of the observed and expected frequencies would yield of chi-square score of 0.000.
V. Chi-Square (χ2) Contingency Table Test
Since 1969, the Gallup organization has been polling Americans regarding their attitudes toward the legalization of marijuana. In 1969, only 12 percent of respondents favored legalizing marijuana. On October 23, 2019, Gallup published the results of their survey: 66 percent of Americans favored the legalization of marijuana, the same as the 2018 survey. Table 13 shows the results for Republicans, Independents, and Democrats:
Table 13: 2019 Gallup Poll on the Legalization of Marijuana
Research Question: Does a person’s position on the legalization of marijuana depend on his or her political identification? We have addressed this question with confidence intervals and two-sample z-tests for proportions. We can also answer this question using a chi-square contingency table test. This test has big advantages over two-sample z-tests for proportions because the analysis can be conducted with a single test without increasing the probability of a Type I error. The down-side is that any chi-square test has lower statistical power than parametric tests like z-tests, t-tests, or ANOVA tests.
A Priori Statistical Power:
Table 14: Effect Size Thresholds
We estimate a moderate effect size of 0.25. As shown in Figure 14, the a priori statistical power calculation from G*Power and Statistics Kingdom find that a sample of 155 is required to achieve 80 percent power.
Figure 14: A Priori Statistical Power Calculation
A sample of 155 is needed to Achieve 80% Power
G*Power
Statistics Kingdom
# 1: Test Set-Up - 2 x 3 Contingency Table
Table 13, shown above, details the proportion of people who do and do not favor the legalization of marijuana by political affiliation. Please note: Gallup has three responses: 1) Yes/legal, 2) No/not legal, and 3) Don’t Know/Refused to Answer. Because the third response had low frequencies (less than 5 for Republicans and Democrats), the “no, not legal” and the “don’t know/refused” responses were combined to meet the requirements of chi-square tests. This is legitimate because all of these respondents do not favor the legalization of marijuana.
Remember: Small frequencies can lead to erroneous conclusions. Chi-square is very sensitive to small sample sizes and tiny frequencies. The “don’t know” and “no answer” responses with the “do not legalize” responses were combined to avoid having a cell in the chi-square contingency table with unacceptably low Expected Frequencies.
# 2: Select the Significance Level, α
A 5 percent level of significance was selected. Here is the formula to determine degrees of freedom for a contingency table test and the results for this problem:
Equation 14: Formula for Degrees of Freedom For Contingency Tables
df = (# of rows – 1)(# of columns – 1) = (3 – 1)(2 -1 ) = 2
The three rows are Republicans, Independents, and Democrats. The two columns are “favors the legalization of marijuana,” and “does not favor the legalization of marijuana.” For this problem with three rows and two columns there are 2 degrees of freedom. As shown in Table 15, the critical value for chi-square is 5.991.
Table 15: Critical value for Chi-Square with 4 degrees of freedom at a 5% significance level = 5.991
The critical value of chi-square can also be found using Excel. Here is the formula: =CHISQ.INV.RT(alpha,df). Please note: We do not count the “total” columns or rows in the calculation of degrees of freedom.
# 3: State the Null and Alternate Hypotheses
• H0: A person’s attitude toward the legalization of marijuana is independent of his or her political identification.
The null hypothesis could also be stated as: There is no relationship between a person’s attitude towards the legalization of marijuana and his or her political identification).
• H1: A person’s attitude toward the legalization of marijuana is dependent on his or her political identification.
The alternate hypothesis could also be stated as: There is a relationship between a person’s attitude towards the legalization of marijuana and his or her political identification.
# 4: Compose the Decision Rule
The decision rule: Reject the null hypothesis if chi-square is greater than 5.991. Figure 15 shows a chart of this chi-square distribution with its rejection region starting at 5.991:
Figure 15: Chi-Square Distribution with 2 degrees of freedom at a 5% significance level, χ 22,0.05
# 5: Calculate the Test Statistic and p-Value
Table 16 shows the observed frequencies based on the data reported by Gallup:
Table 16: Chi-Square Contingency Table with Observed Frequencies
Equation 15 shows the formula we use to calculate the expected frequencies:
Equation 15: Formula for Calculating Expected Frequencies
Please note: The row and column totals for the observed frequencies must equal those for the expected frequencies.
Table 17 shows the contingency table with the expected frequencies.
Table 17: Chi-Square Contingency Table with Expected Frequencies
The next step is to calculate the chi-square test statistic, p-value, and effect size. Table 18 shows the result of this calculation:
Table 18: Completed Chi-Square Contingency Table
The chi-square value of 72.181 is very large and the p-value is extremely small, well below 0.001. When p-values are very small we report them as <0.001. There is no need to report a number with four or more zeros after the decimal point.
# 6: Decide and Report
With chi-square as high as 72.181 and a p-value less than 0.001, we reject the null hypothesis. This test has statistical significance. Conclusion: There is a person’s attitude toward the legalization of marijuana depends, in part, on his or her political identification. A post hoc analysis would be needed to determine which of the three pairs are unequal: Democrats vs. Republicans, Democrats vs. Independents, and Independents vs. Republicans.
These findings have practical significance for policy-makers. It is important for policy-makers to note that attitudes towards the legalization of marijuana are dependent on a person’s political affiliation. In addition, half of the Republicans surveyed, the group least likely to favor the legalization of marijuana, now support its legalization, found by 199/393 equals 50.64 percent.
V. Chi-Square (χ2) Normality Test
Dr. V is studying his students’ test results. He has a random sample of 144 exams. Before he can proceed in his analysis, he needs to determine whether the data are normally distributed. If so, he will conduct his analysis using parametric techniques. If the data are not normally distributed, he must employ nonparametric techniques. Table 19 shows Dr. V’s data:
Table 19: 144 Test Results
The first question Dr. V has is whether this sample is large enough to achieve 80 percent power. He anticipates a relatively strong effect size of 0.40. He anticipates that there will be five degrees of freedom
Table 20: Phi Effect Size Interpretation
An a priori statistical power analysis run using G*Power indicates that a sample of 81 is needed to achieve 80 percent power. Dr. V’s sample of 144 exams, therefore, will provide ample statistical power.
Figure 16: A Priori Statistical Power
# 1: Test Set-Up
We will conduct this test using Microsoft Excel because is much faster and easier than doing it by hand. It would be helpful to look at the #4Normality worksheet in the Chapter17_ChiSq_Examples.xlsx workbook. Here are the seven steps to conduct a chisquare goodness of fit test for normality:
Step 1: Calculate the sample mean, Xbar, and the sample standard deviation, s. (Please note: if possible calculate the population mean, μ, and population standard deviation, σ).
Step 2: Find the z-values for each random variable using Excel’s STANDARDIZE function.
Step 3: Create a frequency table for the z-values from -3.50 to +3.50 in 0.50 increments. This table should have four columns: 1) The bins or categories, 2) The observed frequencies, 3) The probability of the expected frequencies, and 4) The expected frequencies.
Step 4: Enter the observed frequencies in the second column of the frequency distribution using Excel’s FREQUENCY array function. Please Note: You must press Control+Shift+Enter when entering an array function.
Step 5: In the third column, enter the probability of the expected frequencies using Excel’s NORM.S. DIST function.
Step 6: Enter the expected frequencies by multiplying the probability of the expected frequencies by the total number of observations.
Step 7: To conform to the chi-square requirement that the expected frequencies must be more than 5, combine adjacent categories when their expected frequencies are 5 or less.
Here are the results of the seven steps to solve this problem:
Step 1: Calculate the sample mean and standard deviation. The sample mean, Xbar, and the sample standard deviation, s, are stand-ins for the unknown population parameters, μ and σ. The mean, X̅, is 70.65 and the standard deviation, s, is 23.19.
Step 2: Find the z-value for each random value using Excel’s STANDARDIZE function:
Equation 16: Microsoft Excel’s STANDARDIZE Function
=STANDARDIZED(X,Mean,Standard Deviation)
Where: X is the random value
Mean is the sample mean
Standard Deviation is the sample standard deviation
Step 3: Create a frequency table with 17 rows and four columns. In the first row, place the names of the four columns. The first column will be for the bins or categories for a normal distribution. We will have 15 categories for the z-values ranging from -3.50 to +3.50. These categories represent the distance from the mean in z-values.
Table 21: Chi-Square Normalcy Table
Step 4: The column labelled “O” is for the observed frequencies. To find these values use Excel FREQUENCY function. This is an array function. Here is the syntax for this function:
Equation 17: FREQUENCY Function
{=FREQUENCY(data array,bin array)}
Where: data array is the range of cells with the data, and bin array is the cell range within the bins (categories). The curly bracket symbols “{}”mean that this function is an array function. Enter array functions by holding down the Control, Shift, and Enter (or RETURN) keys when you enter the formula.
Table 22 shows the results of using the FREQUENCY function.
Table 22: Observed Frequencies Found Using the FREQUENCY Function
Please Note: The first three categories and the last five categories have no values. We will eliminate these categories after we find the expected frequencies.
Step 5: The third column is for the probability of the expected frequencies. The first cell in this column has a different formula than the other cells.
Equation 18: Excel Formula for the first cell in P(E) column
=NORM.S.DIST(F2,True)
Where: F2 is the first cell in the Bins column.
The second cell gets this function, which can then be pasted into the remaining cells in this column:
Equation 19: Excel Function for the Second Cell in the P(E) Column
=NORM.S.DIST(F3,True)- =NORM.S.DIST(F2,True)
Table 23 shows the probability of the expected frequencies.
Table 23: P(E) Column Values
Step 6: To find the expected frequencies, we multiply P(E), which is the relative frequency for the expected frequencies, by the total number of observations, n, 144 as shown in Table 24.
Table 24: Expected Frequencies
We now have our expected frequencies. But, we also have a minor problem. We cannot conduct a chi-square goodness of fit test when some categories have expected frequencies of five or less. In addition, observed frequencies in the first or last cell must be greater than zero. We must combine categories.
Step 7: Combine the categories, when necessary. Table 26, shows the completed contingency table with combined categories.
Table 25: Combined Categories
# 2: Select the Significance Level, α
We will use a 0.05 significance level. Degrees of freedom for most chi-square goodness of fit tests are found by using the formula k – 1. But, with tests for normality that use the sample mean and the sample standard deviation to estimate μ and σ, two degrees of freedom are lost. The formula for degrees of freedom, therefore, changes to k – 3. With eight classes, we have five degrees of freedom. The critical value for chi-square with 5 degrees of freedom and a 5 percent significance level is 11.070 as shown in Table 26.
Table 26: Critical Value of Chi-Square with 5 df at a 5% α
# 3: State the Null and Alternate Hypotheses
• H0: The distribution follows a normal probability distribution
• H1: The distribution does not follow a normal probability distribution
# 4: Compose the Decision Rule
Figure 17 shows a chi-square distribution with 5 degrees of freedom and a 5 percent significance level. The decision rule: Reject the null hypothesis if chi-square is greater than 11.070.
Figure 17: Chi-Square Distribution with 5 df and a 5% α, χ 25,0.05
# 5: Calculate the Test Statistic, p-Value, Effect Size, and Power
Table 27 shows the calculation for chi-square and the p-value:
Table 27: Chi-Square Calculation
Effect size for this problem is 0.3967, found by:
Equation 20: Effect Size
This is a moderate effect that is close to being a relatively strong effect based on the standard interpretation of Phi effect size. See Table 28.
Table 28: Interpreting Effect Size
Given the moderate effect size of 0.3697, we need not worry that this test is over-powered.
# 6: Decide and Report
The chi-square value of 22.665 is in the rejection region and the p-value of 0.04 percent is well below the 5 percent significance level. We reject the null hypothesis. There is very strong evidence that Dr. V’s data are not normally distributed. Conclusion: Dr. V cannot use parametric tests to analyze this data. This conclusion has practical significance because any analysis using parametric techniques would be seriously flawed and Dr. V would look foolish if he were to present his analysis using parametric techniques.
VI. Summary
We have demonstrated that chi-square is an easy-to-use technique to determine whether the relationship of the data we observed fits the expected distribution. We looked at three types of chi-square tests: 1) The chi-square goodness of fit or one-way test, 2) The chi-square contingency or two-way test, and 3) The chi-square goodness of fit test for normality. All three test use the same formula for calculating the chi-square test statistic:
Equation 21: The Chi-Square Test Statistic Equation
The most challenging aspect of chi-square is determining the expected frequencies. We examined several examples of how to find the expected frequencies.
VII. Exercises
Data for these exercises can be found in Chapter16_Exercises.xlsx.
Exercise 1 – Chi-Square Goodness of Fit Test
A recent study in a national business journal asked consumers the following question: “In general, how would you rate the level of service provided by your supermarket?” A district manager of Piggly Wiggly supermarkets asked Piggly Wiggly shoppers in her district the same question. She took a random sample of 525 Piggly Wiggly shoppers. The question she seeks to answer is: Do the results of her survey of Piggly Wiggly shoppers contradict the national study using a 5 percent significance level?
# 1: Test Set-Up
Using G*Power or Statistics Kingdom run an a priori power analysis to determine the sample size needed to achieve 80 percent statistical power. Estimate effect size at 0.30 effect size.
Table 29 shows the result of the Piggly Wiggly survey along with consumers’ ratings of supermarkets from the national study.
Table 29: Piggly Wiggly Supermarket Survey
Find the expected frequencies:
Table 30: Expected Frequencies (Note: Expected frequencies have been rounded off to a whole number.)
# 2: Select the Significance Level, α
A 5 percent significance level has been selected. What is the critical value of chi-square?
# 3: State the Null and Alternate Hypotheses
• H0:
• H1:
# 4: Compose the Decision Rule
# 5: Calculate the Test Statistic, ES, p-Value, and statistical power
# 6: Decide and Report
Exercise 2 – Chi-Square Contingency Table (Test of Independence)
# 1: Test Set-Up
Using G*Power or Statistics Kingdom run an a priori power analysis to determine the sample size needed to achieve 80 percent statistical power. Estimate effect size at 0.40.
A major retailer is looking at an old study on shoppers’ favorite person to shop for during the holiday season. They have data for baby boomers (people born between 1946 and 1964), Gen-Xers (people born between 1965 and 1980), and Millennials (people born between 1981 and 1996). Results of the study are shown in Table 31.
Table 31: Observed Frequencies Source – Russell Research
Find the expected frequencies:
Table 32: Expected Frequencies
# 2: Select the Significance Level, α
A 5 percent significance level has been selected. What is the critical value of chi-square?
# 3: State the Null and Alternate Hypotheses
• H0:
• H1:
# 4: Compose the Decision Rule
# 5: Calculate the Test Statistic, ES, p-Value, and statistical power
# 6: Decide and Report
Exercise 3 – chi-square Goodness of Fit Test for Normality
Dr. Siegfried Zuckerkrank is a young endocrinologist who works at one of the world’s foremost diabetes clinics. He is especially interested in how well newly-diagnosed insulin-dependent patients manage the difficult task of regulating their blood sugar levels. For a person without diabetes, blood sugar typically ranges from a low of 70 mg/dL to a high of 130 md/dL. An insulin-dependent diabetic is prone to both hypoglycemia (blood sugar <70 mg/dL) and hyperglycemia (blood sugar > 180 mg/dL). Insulin-dependent diabetics struggle to avoid both hypoglycemia and hyperglycemia.
One measure Dr. Zuckerkrank wants to study is the blood sugar levels of patients when they visit the clinic for their first examination after their initial diagnosis. Table 33 shows the blood sugar levels for a random sample of 121 patients:
Table 33: Blood Sugar Levels mg/dL
Dr. Zuckerkrank’s question: Are the blood sugar levels normally distributed? To answer this question, he will conduct a chi-square goodness of fit test for normality. If the data are normally distributed, parametric tests may be used. Should the data not be normally distributed, nonparametric techniques must be employed.
# 1: Test Set-Up
Using G*Power or Statistics Kingdom run an a priori power analysis to determine the sample size needed to achieve 80 percent statistical power. Estimate effect size at 0.34 effect size. Assume that there will be 6 degrees of freedom.
Set up the test using the following steps:
a. Calculate the sample mean, Xbar, and the sample standard deviation, s.
b. Find the z-values for each random value using Excel’s STANDARDIZE function.
c. Create a frequency table for the z-values from -3.50 to +3.50 in 0.50 increments. The table should have four columns: 1) the bins, or categories, 2) the observed frequencies, 3) The probability of the expected frequencies, and 4) the expected frequencies. Table 34 shows the format of this table:
Table 34: Frequency Distribution Format
a. Enter the observed frequencies in the second column of the frequency distribution using Excel’s FREQUENCY array function.
b. In the third column enter the probability of the expected frequencies using Excel’s NORM.S. DIST function.
c. Enter the expected frequencies by multiplying the probability of the expected frequencies by the total number of observations.
d. Because the expected frequencies in the tails are small, combine categories to conform to the chi-square requirement that the expected frequencies must be at least 5. Also, combine the observed frequencies categories at the extremes, if these categories have a frequency of zero. The condensed frequency distribution meets the requirements for a chi-square test because none of the expected frequencies have a frequency less than 5.
# 2: Select the Significance Level, α
Dr. Zuckerkrank intends on using a 0.05 significance level.
# 3: State the Null and Alternate Hypotheses
• H0:
• H1:
# 4: Compose the Decision Rule
# 5: Calculate the Test Statistic, ES, p-Value, and post hoc statistical power
# 6: Decide and Report
Except where otherwise noted, Clear-Sighted Statistics is licensed under a
Creative Commons License. You are free to share derivatives of this work for
non-commercial purposes only. Please attribute this work to Edward Volchok.
Endnotes
1¹ R. A. Fisher, Scientific Methods for Research Workers, 13th Edition, (New York: Harper, 1958), p. 22.
2² Karl Pearson, “On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can Be Reasonably Supposed to Have Arisen from Random Sampling,” Philosophical Magazine. Series 5. 50 (302), 1900, pp. 157–175.
3³ A. W. F. Edwards, “R. A. Fisher on Karl Pearson,” Notes and Record of the Royal Society of London, Volume 48, No. 1, January 1994. pp. 97-106.
4⁴ Jacob Cohen, Statistical Power Analysis for the Behavioral Sciences, Second Edition, (New York: Psychology Press, 1988), p. 215.
5⁵ Jacob Cohen, Statistical Power Analysis for the Behavioral Sciences, Second Edition, (New York: Psychology Press, 1988), p. 227.
6⁶ Jacob Cohen, Statistical Power Analysis for the Behavioral Sciences, Second Edition, (New York: Psychology Press, 1988), p. 227.