Clear-Sighted Statistics
Chapter 16: One-Way ANOVA Tests
Figure 1: The NHST Cycle
I. Introduction
In Chapter 15, we conducted hypothesis tests for two independent means along with other two-sample tests. In this chapter, we will cover the most basic ANOVA test, One-Way or Single-Factor ANOVA. ANOVA tests are a series of statistical techniques used to test—to nullify—the null hypothesis that two or more independent population means are equal. ANOVA is based on analyzing variance three ways; hence the name ANOVA, which stands for ANalysis Of VAriance. It is through the analysis of variance that we can test the equality of means among two or more groups. To test multiple proportions, we can use a chi-square technique, which will be covered in Chapter 17.
After completing this chapter, you will be able to:
• Discuss the key assumptions of a One-Way ANOVA test and understand that this test is considered “robust,” which means that it is useful when there are modest violations of some key assumptions.
• Recognize that there are a variety of methods to determine whether the ANOVA assumptions have been violated.
• Identify the tests used when there are serious violations to the ANOVA assumptions.
• Understand that ANOVA is an omnibus test. This means that when you are testing more than two means and you reject the null hypothesis, you cannot tell which specific pairs of means are statistically different. To determine which pairs of means differ, you need to run a post hoc test. “Post hoc” means “after the fact.”
• There are many types of post hoc analyses. We will use Microsoft Excel to conduct Fisher’s least square difference (LSD) confidence intervals.
• Appreciate the advantages of the One-Way ANOVA test over TwoSample ttests for independent means.
• Use G*Power and the Statistics Kingdom sample size calculator to run an a priori statistical power analysis.
• Conduct a One-Way ANOVA test by hand and with Microsoft Excel.
• Calculate and interpret the eta-squared, η2, effect size, know how this measure of effect size compares to Cohen’s f effect size, and how these measures relate to practical significance.
• Be aware that there are a variety of ANOVA tests.
You should download the following files that accompany this chapter:
• Critical Values Tables for F-distributions at 1 and 5 percent significance levels. These files are available in Excel and pdf formats.
- F-Distribution_.05_.01.pdf
- F-Distribution_.05_.01.xlsx
• The Excel file for the examples shown in this chapter, Chapter16_Examples.xlsx.
• The Excel file for the end of the chapter exercises: Chapter16_Exercises.xlsx.
• An Excel file for calculating post hoc statistical power and Fisher’s LSD confidence intervals: Chapter16_ANOVA_Fisher’sLSDPostHoc.xlsx.
II. Background on ANOVA
Ronald Aylmer Fisher developed ANOVA tests in 1918 and formalized them in his groundbreaking book Statistical Methods for Research Workers published in 1925.1 He invented ANOVA tests while conducting agricultural experiments at the Rothamsted Experimental Station, one of the world’s oldest and most prestigious agricultural research institutions, which is now called Rothamsted Research.
Fisher was interested in determining whether different “treatments” of crop fertilizers produced different average (mean) yields. A treatment is a specific source of variation in each data set or treatment. In Fisher’s experiments, the treatments were different ways fertilizers were used on crops. Treatments are typically qualitative—nominal or ordinal—categories. Treatments are often considered independent variables or controlled inputs. We also have dependent or outcomes variables, which must be interval or ratio data because we will calculate their means and variances. As discussed in Chapter 5, the mean and variance can only be calculated with quantitative data. These dependent variables are measurements within the treatments.
Many people find the name ANOVA confusing because we are interested in testing whether all the population Treatment Means are equal, but ANOVA is based on analyzing the ratios of variance measured three ways:
- Total Variation: The sum of the squared differences between each observation and the overall or Grand Mean,
. The Grand Mean is the mean of all variables regardless of the treatments. Total Variation is usually abbreviated as Total, SS total or Total SS. Excel calls this source of variation “Total.” We will follow Excel’s nomenclature. The “SS” stands for “Sum of the Squares.” The meaning of “SS” will be explained shortly.
- Random Variation: The sum of all the squared differences between each observation and its Treatment Mean. Excel calls this source of variation “Within Groups.” We will abbreviate “Within Groups” as SSW. This measure is also called “Error” and is sometimes abbreviated as “SSE.”
- Treatment Variation: This is the difference between Total Variation and Random Variation, or Total minus SSW. Treatment Variation has numerous names. Excel uses the term “Between Groups.” We will follow Excel and call this source of variation, “Between Groups” or SSB. This measure is also called “Between Samples,” “Treatments (SST),” or “Factors.” Because ANOVA is usually used with three or more treatments, the word “between,” which implies a space separating or connecting two things, is grammatically incorrect. The correct preposition would be “among.”
The ANOVA test finds the ratio between the three estimates of variation. Table 1 shows the structure of a One-Way ANOVA table. ANOVA tables follow the Fdistribution. You will recall from Chapter 15 F-distributions are defined by two different degrees of freedom. The first is degrees of freedom in the numerator (SSB), defined as the number of treatments, k, minus one. The second is degrees of freedom in the denominator (SSW), which is defined as the total number of observations in all treatments, n, minus the number of treatments, k. These are the degrees of freedom used to find the critical value of the Fdistribution using paper tables or Microsoft Excel. A third degrees of freedom for Total is found by n minus 1. It is also the sum of the two other degrees of freedom. The result of the ANOVA test is an F-value that will tell us whether this value is too large to be the result of random sampling error. When this occurs, we have statistical significance. Statistical significance means that at least one pair of treatment means are unequal. . One very important measure—the p-value—is missing from this table. We will use Excel to calculate p-values for F.
Table 1: Structure of a One-Way ANOVA Table
Where: k = number of treatments
n = total number of random variables regardless of treatment
df = Degrees of Freedom
= Grand Mean. The sum of all the variables over the total number of variables
= Treatment Mean. The sum of all the treatments variables over the number of treatment variables.
To repeat, because ANOVA is an omnibus test, when we have more than two treatments and there is statistical significance, we do not know which of the treatment pairs are unequal. This is why we need to conduct a post hoc analysis whenever the null hypothesis is rejected. Of course, a post hoc analysis is not needed when there are only two treatments.
A. Conducting a One-Way ANOVA Test (Mostly) by Hand
Here are the steps to construct an ANOVA table by hand. This is the kind of test typically presented in introductory textbooks. We will, however, supplement our calculation with Excel, G*Power, and online power calculators. We will find that performing the relatively small number of calculations for this example time consuming. In addition, this test will lack sufficient statistical power.
We start with the Sum of the Squares (SS) column. First we calculate Total. To repeat, Total is the sum of the squared deviations of each variable, X, regardless of treatment from the Grand Mean, . The formula for Total is:
Equation 1: Formula for Sum of the Squares Total (Total)
Total = Σ(X - )2
Next we calculate the Sum of the Squares Within Groups, SSW, which is the sum of all the squared deviations of each variable, X, from its particular Treatment Mean, . The formula for SSW is:
Equation 2: Formula for Sum of the Squares Within Groups (SSW)
SSW = Σ(X - )2
The Sum of the Squares Between Groups, SSB, is the difference between Total and SSW found using the following formula:
Equation 3: Formula for Sum of the Squares Between Groups (SSB)
SSB = Total - SSW
We will go through these calculations shortly when we turn to our first example.
The next step is to find the degrees of freedom for SSB, the degrees of freedom found in the numerator, and for SSW, the degrees of freedom found in the denominator. The formulas for these degrees of freedom are shown above in Table 1. We could, however, find degrees of freedom before we calculate the three Sum of the Squares. Please note: As previously stated, the SS Total has its own degrees of freedom, found by n minus 1, where n stands for the total number of observations. This figure, however, is not used for finding the critical value of F or the value of the F statistic.
Once the Sum of the Squares and degrees of freedom have been entered into the ANOVA table, we move one column to the right and calculate the Mean Squares (MS) for the numerator and denominator. Equation 4 shows the formula for MSB, mean squares between groups:
Equation 4: Formula for Mean Square Between Groups (MSB)
MSB = SSB/(df, defined as k - 1)
Equation 5 shows the formula for MSW, mean squares within groups:
Equation 5: Formula for Mean Squares Within Groups (MSW)
MSW = SSW/(df, defined as n - k)
With the Mean Squares calculated, we complete the ANOVA table by moving one column to the right and calculate the F-value, which is the ratio between MSB and MSW. Equation 6 shows this formula:
Equation 6: Formula for F
F = MSB/MSW
B. Stating the Null and Alternate Hypotheses
The null hypothesis for ANOVA tests is that all of the Treatment Means are equal. It is written using mathematical notation: H0: μ1 = μ2 … = μn. The null hypothesis always refers to the population parameter, which in this case is the population mean, μ.
The alternate hypothesis states that at least one pair of treatments has unequal means. The alternate hypothesis is not written with mathematical notation. Here is the alternate hypothesis for an ANOVA test: H1: At least one pair of Treatment Means is not equal.
It is very important to note that it is a serious mistake to write the alternate hypothesis with mathematical notation: H1: μ1 ≠ μ2 … ≠ μn. This statement indicates that all the treatments have unequal population means. But, all it takes to reject the null hypothesis is for just one pair of treatments to have unequal means. Remember: As discussed in Chapter 13, the null and alternate hypotheses must be mutually exclusive and collectively exhaustive. Were we to write the alternate hypothesis using unequal signs, our null and alternate hypotheses would not be collectively exhaustive because it is possible for some of the treatment pairs to be equal while some are unequal.
III. Key ANOVA Assumptions
There are three key assumptions for an ANOVA test:
Figure 2: The Key ANOVA Assumptions
Assumption 1: The treatments are independent.
The independence of the treatments is ensured by good research design that randomly assigns subjects to the treatments. ANOVA tests cannot be used for dependent samples. This assumption, therefore, must never be violated. All examples in this chapter have independent treatments. You will recall from the discussion of independent samples in Chapter 15, samples are independent when the measurements from one sample are not affected by the measurements of another sample. In addition, no measure from within a treatment should affect another measure in that treatment.
Assumption 2: The treatments are normally distributed.
When the treatment data are not normally distributed, the ANOVA test may provide misleading results. In Chapter 4, we discussed that a normal distribution is a symmetrical distribution centered around the mean, median, and mode with most variables clustering around the center with fewer extreme values located towards the left and right tails. ANOVA tests, like t-tests, are considered robust; which is to say, these tests are useful even when the data are somewhat skewed.
There are a variety of ways to determine whether the data follow a normal distribution.
The normality of the data can be assessed using charts. These methods, however, should be used with caution because they are not formal statistical tests.
The first visual method is to draw histograms to determine whether the data are normally distributed; which is to say, symmetrical around the mean. You will recall that in Chapter 4, symmetrical, nearly symmetrical, and asymmetrical histograms where presented. Figure 3 shows two histograms. The symmetric histogram on the left represents a nearly perfect normal distribution. The asymmetric distribution on the right is clearly not a normal distribution.
Figure 3: Histograms with a Normal and Non-Normal Distribution
Using histograms can aid one’s judgment in determining whether the data are normally distributed, but this method lacks rigor.
The second graphic method is to draw Q-Q plots, or Quantile-Quantile plots. Q-Q plots can be easily constructed in Excel. Here are the steps:
Step 1: Enter and sort the data in order of magnitude.
Figure 4: Q-Q Plot Step 1
Step 2: Rank each variable using Excel’s RANK function.
Figure 5: Q-Q Plot Step 2 - Functions are in Column B
Step 3: Find the percentile for each variable.
Figure 6: Q-Q Plot Step 3 - Functions are in Column C
Step 4: Calculate the z-value for each variable using Excel’s NORM.S.INV function.
Figure 7: Q-Q Plot Step 4 – Functions are in Column D
Step 5: Create the Q-Q Plot by: 1) Copy the data from Column A into Column E, 2) Insert and X-Y Scatter chart using Columns D and E to draw the chart, and 3) Add a trend line.
Figure 8: Q-Q Plot Step 5
When the data fall roughly along the 45-degree trend line, the data are normally distributed. Based on the Q-Q Plot shown in Figure 8, our data approximate a normal distribution. In Chapter 18, we will see that Excel’s Data Analysis Tool for Regression can draw a Q-Q Plot as part of the output of the regression analysis.
The assumption of normality can also be tested using a Chi-Square test for normality. We will cover this test in Chapter 17.
Dedicated statistics programs such as SPSS, SAS, Stata, and R among others can conduct more sophisticated tests for normality like the Shapiro-Wilk or Kolmogorov-Smirnov (K-S or KS test) very quickly. The advantage of programs with a graphic user interface like SPSS is that to execute these tests all you must do is check a box. The use of these tests along with the statistical software that runs them, however, is beyond the scope of an introductory textbook. Some third-party plug-ins for Excel can calculate a Shapiro-Wilk and Kolmogorov-Smirnov tests.
When the normality assumption is violated, we should use a nonparametric test. The most commonly used nonparametric alternative to a One-Way ANOVA test is the Kruskal-Wallis H test, which is sometimes called a One-Way ANOVA on Ranks. As a nonparametric test, this test does not require the data to be normally distributed. It does, however, require the treatments to be independent and for the treatment variances to be roughly equal. The Kruskal-Wallis H test is beyond to scope of an introductory textbook. The only nonparametric tests covered in Clear-Sighted Statistics are chi-square tests, which will be covered in the next chapter.
For tests using larger samples, we can relax the assumption of normality. The normality assumption becomes more important when the sample sizes or the effect size are very small.
Assumption 3: The treatments have nearly equal variances.
Equality of variance is called homogeneity of variance. SPSS and other dedicated statistical software can quickly run the Levene, Brown-Forsythe, and Bartlett’s tests for homogeneity of variance. These tests examine the null hypothesis that the treatment variances are equal.
Microsoft Excel does not have a built-in function for the Levene’s test, which is widely used to test for homogeneity of variance. Nevertheless, conducting a Levene’s test in Excel is fairly simple. The first step is to calculate the mean for each treatments. This can be done using Excel’s AVERAGE function. The second step is to find the absolute residual of each variable from its treatment mean. The absolute residuals are found by dropping any negative signs. Here is the syntax for this calculation in Excel:
Equation 7: Excel Function for Finding the Absolute Residuals
=ABS(Treatment Variable – Treatment Mean)
The third step is to run a standard one-way ANOVA test of the absolute residuals. When the results of this ANOVA test lack statistical significance, we will fail to reject the null hypothesis, which would mean we lack sufficient evidence that the variances are unequal.
The rule of thumb for One-Way ANOVA tests with treatments that have equal sample size is that the ratio between the smallest and largest treatment standard deviations should be less than 2:1, or 4:1 when using variance, although some commentators say a variance ratio as high as 5:1 is acceptable. This is another way that One-Way ANOVA tests are robust. When the treatments do not have equal sample sizes, however, the largest treatment sample variance should be no more than 1.5 times the size of the smallest sample variance.
When the variances are unequal we use a Welch’s ANOVA test instead of a OneWay ANOVA or the Kruskal-Wallis H tests. This is because both the One-Way ANOVA and Kruskal-Wallis H test require homogeneity of variance whereas Welch’s ANOVA does not.2 Because Welch’s ANOVA is an advanced technique, it will not be covered in Clear-Sighted Statistics. While conducting a Welch’s ANOVA is easy with dedicated statistical apps like SPSS, this test can be performed using Excel. John H. McDonald, creator of the website Handbook of Biological Statistics has an Excel workbook for Welch’s Anova on his page on One-Way ANOVA. Here is the link: http://www.biostathandbook.com/onewayanova.html#welch.
IV. Advantages of the ANOVA test Over Two-Sample t-tests For Independent Means
ANOVA is an extension of the two-sample t-test for independent means, where F equals t2:
Equation 8: Relationship Between F and t
F = t2
When we have only two samples, we could use an ANOVA test instead of the two sample ttests for independent means. The big advantage of the ANOVA test, however, is that it allows us to compare more than two samples simultaneously. With a two-sample t-test for independent means, we would have to test each combination of two samples out of the total number of samples. You can calculate the number of two-sample t-tests you would have to conduct using this formula: Number of two-sample t-test = k(k – 1)/2, where k is the number of treatments. Table 2 shows the number of two-sample t-tests based on the number of treatments.
Table 2: Number of t-tests based on the number of treatments or k
Not only is running multiple t-tests time consuming, the probability of committing a Type I error increases drastically. Consider testing four treatments: A, B, C, and D using two-sample t-tests. We would be required to conduct 6 two-sample t-tests:
1. A vs. B,
2. A vs. C,
3. A vs. D
4. B vs. C,
5. B vs. D, and
6. C vs. D.
If we are using a 5 percent significance level for a single test, the probability of avoiding a Type I error is 95 percent found by 1.00 minus 0.05. The probability of running six tests with no Type I error is 0.95 to the sixth power, or 0.95 * 0.95 * 0.95 * 0.95 * 0.95 * 0.95 = 0.735. The probability of at least one Type I error is 0.265, found by 1 minus 0.735, not 0.05. This, of course, is an unacceptably high probability of a Type I Error.
V. One-Way ANOVA Examples
Our first example is typical of those tests found in most introductory statistics textbooks. These small tests demonstrate how to conduct a One-Way ANOVA test. The problem is that the sample size is usually too small to achieve sufficient statistical power. The risk of committing a Type II error, therefore, may be unacceptably high. Consequently the results may be inconclusive. The only advantage of such simple examples, however, is that they demonstrate how to perform the calculations for this test by hand. That said, using Microsoft Excel is always faster and more accurate than calculations done with paper and pencil or a handheld calculator, because Excel’s calculations are worked out to fifteen digits past the decimal point.
Example 1: ANOVA By Hand
1) Test Set-Up
Gotham Sprockets makes finely machined sprockets. The Vice President of Production wants to find out which of three production methods—X, Y, or Z—yields the highest hourly production rate. She randomly assigns experienced machinists to use one of the three production methods. The research question: Are the yields equal for all three production methods (treatments)? Here are the results:
Table 3: Number of Sprockets Produced in an Hour
As part of good research design, the sample size should be determined before the data are selected. The Vice President of Production failed to conduct an a priori power analysis, which is a big mistake. You will recall from Chapter 13, an a priori power analysis, which determines how large a sample is needed to achieve a desired level of statistical power. Remember: When our sample is too small, we may not be able to determine reliably whether there is a difference in the production rate among the three treatments because the probability of a Type II error would be far too high. A Type II or β error, you will recall, occurs when we fail to reject a null hypothesis when there is actually an important effect. Another concern is that low powered tests can also exaggerate the size of the effect.3
Here are the a priori statistical power calculations using G*Power and Statistics Kingdom:
To run this power analysis using G*Power, we select the following inputs:
• Test Family: F tests
• Statistical test :ANOVA: Fixed effects, omnibus, one-way
• Type of power analysis a priori: Compute required sample size for Type of power analysis
• Effect size f: is 0.200 (G*Power requires Cohen’s f). This value is the estimated Cohen’s f effect size.
• α err prob is 0.05
• Power (1-β err prob) is 0.8
• Number of groups is 3 based on the number of treatments
Figure 9: A priori Power Analysis
As shown in Figure 9, the necessary sample size for 80 percent power is 246, over eight times larger than the current total sample size of 28.
We can also run an a priori power calculation using an online power calculator. Here is the calculation done using Statistical Kingdom’s sample size calculator. Here is the link: https://www.statskingdom.com/sample_size_regression.html. Like G*Power, this tool requires Cohen’s f effect size. And while G*Power can only be used on a computer running Windows or Macintosh operating systems, using Statistical Kingdom requires only an internet connection.
When conducting an a priori power calculation using Statistical Kingdom, you must change the “Type” from Regression to ANOVA. In addition, like G*Power, you must use Cohen’s f effect size instead of eta-squared, η2.
Figure 10: Statistics Kingdom’s A Priori Power Analysis Inputs
As show in Figure 11, Statistics Kingdom reports that a sample of 244 is needed to achieve 80 percent power. The difference in the sample size calculations between G*Power and Statistics Kingdom’s are tiny. This big issue here is that our test with less than 14 percent statistical power is underpowered.
Figure 11: Statistical Kingdom’s A Priori Statistical Power Calculation
2) Select a Level of Significance, α
The level of significance is 5 percent.
3) State the Null and Alternate Hypotheses
The null and alternate hypotheses are:
H0: μX = μY = μZ;
H1: At least one pair of treatment means are unequal.
Remember: The alternate hypothesis does not state that all pairs of treatments are unequal.
4) Compose the Decision Rule
The critical values tables for F-distributions can be very large. To make them more manageable, F-tests are forced to be right-tailed tests. Even so, these tables are often missing critical values for some combinations of degrees of freedom. Excel can calculate the critical F values for any combinations of significance levels and degrees of freedom in the numerator and denominator.
As previously stated, degrees of freedom in the numerator is defined as the number of treatments, k, minus one. In this case, there are three treatments, so we have 2 degrees of freedom. Degrees of freedom in the denominator is defined as the total number of observations or n minus k. In this case, n is 28 minus the three treatments so there are 25 degrees of freedom. Table 4 shows the formulas for finding the degrees of freedom in the numerator and denominator.
Table 4: Finding Degrees of Freedom (df) in the Numerator and the Denominator,
where k is the number of treatments and n is the total number of observations
The critical value for F(2, 25) with a 5 percent significance level is 3.39, found using the critical values table:
Table 5: Critical Value for F(2, 25) at 5% Significance Level Equals 3.39
Caution: When using a critical value table, you must double-check that you are using the table that matches the selected significance level. The table for the 1 percent significance level shows that the critical value for F(2,25) is 5.57.
You can also determine the critical value of F using Excel. Here is the Excel formula for finding the critical value of F:
Equation 9: Finding the Critical Value of F Using Excel
=F.INV.RT(α,df numerator,df denominator)
Where α is the level of significance
As shown in Figure 12, Excel reports the critical value as 3.38518996. This is needlessly precise, so we round this number off to 3.39.
Figure 12: F.INV.RT Function
The decision rule: Reject the null hypothesis if F is greater than 3.39. Figure 13 shows a chart for F(2,25) with a 5 percent rejection region in black on the right tail:
Figure 13: F-Distribution at a 5% significance level for F(2, 25)
5) Calculate the Value of the Test Statistic, p-Value, and Effect Size
Table 6 shows the basic ANOVA table calculations:
Table 6: Basic ANOVA Table Calculation
Where: k = number of treatments
n = total number of random variables regardless of treatment
X̅G = Grand Mean
X̅T = Treatment Mean
These calculations are easy; but, there are a lot of them. Here are the steps to complete an ANOVA table. Please note: We cannot find the p-value for our test statistic when we perform the calculations in this manner.
Step 1: Fill in the degrees of freedom
We already did this step when we wrote the decision rule. Table 7 shows the ANOVA table with the degrees of freedom added:
Table 7: Basic ANOVA Table Calculation
Step 2: Calculate the Sum of the Squares Total (Total)
Equation 10 shows the formula for the Sum of the Squares Total:
Equation 10: Sum of the Squares Total
The Grand Mean, is the mean of all the variables regardless of which treatment they have been assigned. Table 8 shows the calculations of the Grand Mean,
.
Once the Grand Mean has been found, the next step to calculate Total is to subtract the Grand Mean for each random variable, as shown in Table 9.
T
The final step for calculating Total is to square these deviations from the Grand Mean and then sum them as shown in Table 10.
We then enter the Total into the ANOVA Table. See Table 11.
Table 11: Basic ANOVA Table Calculation
Step 3: Calculate the Sum of the Squares Error Within Groups (SSW)
The formula for SSW is:
Equation 11: SSW
When calculating SSW, we first find the Treatment Means. Then we subtract the treatment for each variable in the treatment as shown in Table 12.
We then square the deviations of the Treatment Means, sum them, and add the SSW to the ANOVA table. See Tables 13 and 14.
We then add the Sum of the Squares Within Group into the ANOVA table as shown in Table 14.
Table 14: Basic ANOVA Table Calculation
Step 4: Calculate the Sum of the Squares Treatments Between Groups (SSB)
The SSB is 38.19, found by 724.68 minus 686.49. Once this value has been calculated, we add it to the ANOVA table. See Table 15.
Table 15: Basic ANOVA Table Calculation
Step 5: Calculate the MSB and MSW
Moving to the right, we calculate the Mean Squares using the formula shown in Table 15. The results are shown in Table 16.
Table 16: Basic ANOVA Table Calculation
# 6: Solve for F (MST/MSE)
The final step is to calculate F, which is the ratio between MSB and MSW. Table 17 shows the completed ANOVA Table.
Table 17: Basic ANOVA Table Calculation
The calculated value of F, our test statistic, is 0.695. Unfortunately, we cannot use the critical values tables to calculate p-values, as we do with z-distributions, or estimate them as we do with t distributions. We can, however, calculate the p-value using Excel. Here is the formula:
Equation 12: p-Value Function in Excel
=F.DIST.RT(F-value,df in numerator,df in denominator)
Excel reports the p-value: 50.83 percent. Because this p-value is greater than the significance level, we fail to reject the null hypothesis. We lack statistical significance. As discussed in Chapter 13, the null hypothesis is not proven when we fail to reject it. Our conclusion means that we lack sufficient evidence to declare that at least one of the treatment pairs are unequal.
In addition, we need to dig deeper. We should have run an a priori power analysis before the data were collected to determine sample size.
The ANOVA test tells us whether the Treatment Means are all equal. But, the ANOVA test does not indicate anything about the size of the effect. Most reputable scholarly publications require authors to report effect size. With ANOVA tests there are several commonly used measures of effect size. Eta-squared, η2, is the measure most commonly used for One-Way ANOVA tests. Here is the formula for eta-squared and the calculation for our example:
Equation 13: Formula of Eta-Squared and Its Calculation
Note: The SSB and Total Values have been rounded, the value for η2 was found using Excel
An eta-squared value of 0.0527 is a small effect. It means that only 5.27 percent of the total variance is accounted for by the treatments. Effect size does not imply a causal relationship between the treatments and the hourly production rates. Jacob Cohen, a psychologist and pioneer of effect sizes, suggested the following guidelines for interpreting eta-squared effect size:4
Table 18: Eta-Square Effect Size Interpretation
With such a high probability of committing a Type II error, you would be on thin ice if you ascribed any practical significance to the results.
We can determine the needed sample size for 80 percent statistical power by conducting an a priori power analysis using G*Power. G*Power requires us to use Cohen’s f effect size instead of eta-squared, η2. Here is the equation and calculation for converting etasquared into Cohen’s f.
Equation 14: ES Calculation Formula
Note: The value for η2 is rounded, Cohen’s f was calculated using Excel
Like eta-squared, η2, the Cohen’s f effect size is considered small. Here are the thresholds for interpreting Cohen’s f:
Table 19: Cohen’s f Effect Size Interpretation
6) Decide and Report
The F-value of 0.695 is less than the critical value of 3.39 and the p-value, 50.83 percent, is much greater than the 5 percent significance level. We lack sufficient evidence to reject the null hypothesis and determine if any of the three production methods are unequal. No decision can be made as to whether this test has practical significance. But, given the fact that the effect size is not negligible, management should consider re-running this test with the larger sample size suggested by G*Power and Statistical Kingdom’ a priori power calculation. Figure 14 shows a chart of the rejection region in black and the pvalue in is red for this F-distribution
Figure 14: F-Distribution for F(2, 25) at a 5% significance level with a p-value of 15.91.
Summing Up: Completing an ANOVA test by hand is time-consuming even if we use tiny samples. To achieve statistical power, we must use bigger samples, but bigger samples require more time-consuming calculations. We can save time by using Microsoft Excel. In addition to saving time, ANOVA tests constructed by Excel are more accurate than performing a lot of mundane calculations by hand. And, we can supplement Excel’s ANOVA tool by using Excel’s built-in functions to calculate the Effect Size and conduct a post hoc analysis should we reject the null hypothesis to determine which treatment pairs are unequal.
Remember: No one should invest a half hour or more doing ANOVA’s simple arithmetic when Excel can construct an ANOVA table in a few seconds.
Example 2: Using Microsoft Excel
1) Test Set-Up
In Chapter 15, we conducted a two-sample t-test with unequal variances. With one of our exercises, we had two overly caffeinated brothers: Jittery Joe and Caffeine Carl. The brothers operate competing espresso stands at either end of a very large shopping mall. They have been arguing heatedly about whose espresso stand has higher daily sales. Their mother hired you to settle this dispute. You found a p-value of 96.26 percent, and properly concluded that there is no evidence that either brother has higher espresso daily sales.
Your analysis impressed the boys’ mother. At her own stand at the mall she sells hot beverages. She calls her stand Muriel Bristol’s Proper English Tea even though her name is Marge.5 She claims her tea is proper English tea with milk always poured into the cup before the tea. After seeing her sons’ results, she thinks her stand has higher sales. To teach her endlessly bickering sons a good lesson, she hired you to determine whether she and her two sons have equal average daily beverage sales at their respective stands.
A Priori Power Calculation Using G*Power
An a priori power calculation should have been done before the data were collected. You can do this easily using G*Power. First convert the estimated etasquare (η2) effect size into the Cohen’s f effect size using the formula shown in Equation 14. Our eta-square effect size of 0.0655 becomes an f effect size of 0.2647.
Here are the G*Power inputs: The test family is F tests. The statistical test is ANOVA: Fixed effects, omnibus, one-way. The type of power analysis is “A priori: Compute required sample size – given α, power, and effect size.” There are four inputs: 1) Effect size f, which is 0.2225 (this effect size is estimated); 2) α err prob is the significance level, 0.05; 3) Power (1- β err prob) is 0.80; and 4) Number of groups, which is 3. As shown in Figure 15, to achieve 80 percent power, the total sample size for all three treatments is 198. As luck would have it, the data provided as 198 observations.
Figure 15: A Priori Power Calculation - Total Sample Size of 198 to Achieve 80% Power
We can also conduct an a priori statistical power calculation using Statistics Kingdom.
Figure 16: Statistics Kingdom’s A Priori Power Calculation
Like G*Power, Statistics Kingdom reports the sample size needed to achieve 80 percent power is 198.
Figure 17: A Priori Power Calculation - Total Sample Size of 198 to Achieve 80% Power
Table 20 shows the daily sales data for Muriel and her sons. Please note: Despite the fact that we had to use an unequal variance t-test when we compared the daily sales for the two brothers, we can still compare the daily sales for the three vendors because the ratio between the lowest and highest variances, 1 to 2.10, is not so large as to violate the assumption of equality of variance for ANOVA tests.
Table 20: Daily Dollar Sales
Given that we have more than two samples, you cannot conduct a t-test without an unreasonably high probability of committing a Type I error. You correctly decide to conduct a One-Way ANOVA test. Of course, you should have conducted an a priori power analysis before the data were collected, but you were given the data when you took the assignment. So, you estimated effect size and conducted a priori and analysis. In addition, if you reject the null hypothesis you will have to conduct a post hoc analysis to determine which treatment pairs are unequal. Unfortunately, Excel’s built-in ANOVA analysis will not perform these calculations.
2) Select a Level of Significance, α
You select a 5 percent significance level.
3) State the Null and Alternate Hypotheses
Here are the hypotheses:
H0: μj = μc = μM;
H1: At least one pair of treatment means are unequal.
To repeat: The alternate hypothesis means that at least one pair of treatment means are unequal. It does not state that all pairs of treatments are unequal.
4) Compose a Decision Rule
As previously stated, degrees of freedom in the numerator is defined as the number of treatments, k, minus one. In this case, there are three treatments, so we have two degrees of freedom. Degrees of freedom in the denominator is defined as the total number of observations minus k; 198 observations minus 3 treatments = 195 degrees of freedom. Table 21 shows the formulas for finding the degrees of freedom in the numerator and denominator.
Table 21: Finding Degrees of Freedom (df) in the Numerator and the Denominator
The critical value for F(2,195) at a 5 percent significance level is not available on the critical values table. You can, however, find the critical value using Excel. Equation 15 shows Excel’s critical value function for F-distributions, which is 3.04:
Equation 15: Excel’s Critical Value Function for F
=F.INV.RT(α,df numerator,df denominator)
Excel’s ANOVA tool will report the critical value as well.
Figure 18: F-distribution F(2,195) 5% significance level
5) Calculate the Value of the Test Statistic and p-Value
Here is how to construct in a few seconds a One-Way or Single Factor ANOVA table using Excel’s Data Analysis ToolPak. The process has three simple steps:
Step 1: Click on the Data Analysis icon on Excel’s Ribbon. The icons for this tool are shown in Figure 20 with the icon for Macintosh on the left and Windows on the right:
Figure 19: Data Analysis icon in Excel’s Ribbon
Step 2: Click on “Anova: Single Factor” and click OK. The “Anova: Single Factor” option is the first on the list. See Figure 20. Once you complete this step, the data input window appears.
Figure 20: Data Analysis: Analysis Tools
Step 3: In the data input window, enter the data in the input range, the significance level under Alpha, and the Output range. See Figure 21 for the completed data input window.
Figure 21: Data input window for Anova: Single Factor
Enter the input range, which is in columns that contain our data. Make certain that “Labels in first row box” is checked. You can enter this information by highlighting the Input Range box and then dragging the cursor through the cells with the data. Once the input range has been entered, enter the significance level in the Alpha box and select the output range. Then click OK. Excel will return the analysis in the output range.
Figure 22 shows the output. Excel generates this analysis in a few seconds. Constructing this ANOVA table by hand could take at least a half hour.
Figure 22: One-Way ANOVA Table
The calculated value for F is 6.83 with a p-value of only 0.0014 or 0.14 percent (that is 0.00125628 rounded). Excel also provides the critical value for F, 3.0422299, which we report as 3.04. The output also provides a very handy statistical summary of the three treatments: Sample size (count), the sum of the treatment variables, Average (mean), and Variance (s2).
Unfortunately, Excel’s ANOVA tool fails to calculate effect size or perform a post hoc analysis to determine which treatment pairs are unequal. We can, however, perform some these calculations with Excel.
Excel can make quick work of the Effect Size calculations.
Table 22 shows the effect size thresholds for eta-squared. An effect size of 0.0655 is a medium-sized effect:
Table 22: Eta-Square Effect Size Interpretation
G*Power and Statistics Kingdom requires the use of Cohen’s f effect size. Here is how to convert eta-squared into Cohen’s f.
Equation 16: ES Calculation Formula
Note: The value for η² is rounded, Cohen’s f was calculated using Excel
Effect Size f = SQRT(η²/1 - η²)
Effect Size f = SQRT(0.0655/1 – 0.0655) = 0.2647
6) Decide and Report
Our decision: Reject the null hypothesis because the value of the F test statistics, 6.85, is greater than the critical value of 3.04 and the p-value of 0.14 percent is less than the 5 percent significance level. We conclude that there is a statistically significant difference in daily sales among the three vendors. Given this low p-value, we would also reject the null hypothesis had we used a stricter 1 percent significance level. There is little chance that our findings are due to random sampling error or that our measurement of effect size is unreliable. This test is also not overpowered, given the fact the ES is not negligible. Conclusion: The three hot beverage stands do not have equal daily sales.
As previously stated, ANOVA is an omnibus test. We now know that the three treatment pairs are not all equal, but we do not know which treatment pairs are not equal. A post hoc analysis is required to answer this question. There are many types of post hoc analyses, the simplest of which is to construct a series of confidence intervals. We need to create Least Significant Difference (LSD) confidence intervals for all the pairs:
1) Jittery Joe and Caffeine Carl.
2) Jittery Joe and Muriel Bristol.
3) Caffeine Carl and Muriel Bristol.
Equation 17 shows the formula for constructing an LSD confidence interval:
Equation 17: Formula for LSD Confidence Intervals
Please note: You can find the number of pairs using combinations. Equation 18 shows the formula for combinations in Excel:
Equation 18: Finding the Number of Combinations in Excel
=COMBIN(number,number_chosen)
=COMBIN(3,2) = 3
For three treatments there are three pairs, for four treatments there are six pairs, and for five treatments there are ten pairs.
We interpret the results of our LSD confidence intervals by looking at the lower and upper confidence limits. Table 23 shows how we interpret the results:
Table 23: Interpreting LSD Confidence Intervals
First, we will compare Jittery Joe with Caffeine Carl. The degrees of freedom for t with a 95 percent confidence level with 195 degrees of freedom is 1.653. This was found using the following Excel Formula:
Equation 19: Finding the Critical Value of t
T.INV(1-alpha,df) or TINV(1-0.05,195)
Here is the LSD Confidence Interval calculation:
Equation 20: LSD Confidence Interval: Jittery Joe’s Sales Equals Caffeine Carl’s
As expected, at a 95 percent confidence level, there is no statistically significant difference between Jittery Joe and Caffeine Carl given that the lower confidence limit is negative and the upper confidence limit is positive.
Here is the comparison between Jittery Joe and Muriel Bristol:
Equation 21: LSD Confidence Interval: Jittery Joe’s Sales Does Not Equal Muriel Bristol’s
Daily beverage sales at Jittery Joe’s stand are not equal to the daily sales at Muriel Bristol’s stand. Muriel Bristol’s stand has higher sales because both the LCL and UCL are negative.
The LSD confidence interval also shows that Caffeine Carl’s sales do not equal sales for Muriel Bristol because both the LCL and UCL are negative.
Equation 22: LSD Confidence Interval: Caffeine Carl’s Sales Does Not Equal Muriel Bristol’s
Calculating the LSD confidence intervals by hand can be time consuming. While Excel lacks built-in post hoc analysis functions, you can use the attached LSD confidence interval template found in 16_ANOVA_Power_PostHoc.xlsx to conduct this analysis quickly. Enter the data for the red cells. These data are in your completed ANOVA table. Remember: You must construct LSD confidence intervals for all the treatment pairs.
LCD #1: Jittery Joe and Caffeine Carl – No Difference
Figure 26: LSD Confidence Interval – Jittery Joe and Caffeine Carl
LCD #2: Jittery Joe and Muriel Bristol – Difference
Figure 27: LSD Confidence Interval – Jittery Joe and Muriel Bristol
LCD #3: Caffeine Carl and Muriel Bristol – Difference
Figure 28: LSD Confidence Interval – Caffeine Carl and Muriel Bristol
Conclusion: Average daily sales for Muriel Bristol’s beverage stand are higher than the sales of her sons’ beverage stands.
Please note: Some commercial Excel plug-ins like MegaStat conduct a post hoc analysis when the ANOVA test is performed. Figure 29 shows the ANOVA test results with two post hoc analyses: 1) Pairwise t-tests and 2) a Tukey simultaneous comparison t-values test. Information about MegaStat can be found at this link. Dedicated statistical programs like SPSS can run a wide variety of post hoc analyses when the ANOVA test is conducted.
Figure 29: Post Hoc Analysis Using the MegaStat Plug-in
V. Other Types of ANOVA
There are many types of ANOVA tests not typically covered in an introductory statistics course. These tests include:
1. One-Way Repeated Measures ANOVA: These ANOVA tests make repeated measures over time. A pharmaceutical company testing two or more medications for controlling adult onset diabetes may test the blood sugar of participants several times a day; which is to say, it will take repeated measures. Such tests are also known as within-subject ANOVA and ANOVA for Correlated Samples.
2. Two-Way Anova Without Replication: This is an ANOVA Test with two sets of independent variables or treatments. Excel’s Data Analysis ToolPak can conduct this test.
3. Two-Way Anova With Replication: An ANOVA Test with two sets of independent variables or treatments. Replication refers to whether the researcher is replicating the test with multiple groups. Excel’s Data Analysis ToolPak can conduct this test.
4. Factorial ANOVA: This test is like a two-way ANOVA test with additional independent variables, treatments, or factors. Excel’s Data Analysis ToolPak cannot conduct this test.
5. MANOVA: Multivariate Analysis of Variance: An ANOVA test with more than one dependent variable. Excel’s Data Analysis ToolPak cannot conduct this test.
6. ANCOVA (Analysis of Covariance): A blend of ANOVA and regression used to determine whether the treatments are equal across independent variables. Excel’s Data Analysis ToolPak cannot conduct this test.
7. MANCOVA (Multivariate Analysis of Covariance): An extension of ANCOVA for multiple dependent variables. Excel’s Data Analysis ToolPak cannot conduct this test.
8. Kruskal-Wallis H test: A nonparametric version of a One-Way ANOVA test. Excel’s Data Analysis ToolPak cannot conduct this test. The Kruskal-Wallis H test, however, can be calculated using Excel despite the fact that Excel lacks a built-in function for this test.
9. Welch’s ANOVA: We use this test when the homogeneity of variance assumption has been violated.
VI. Summary
The One-Way ANOVA test is a relatively simple and extremely useful way to determine whether two or more independent samples have equal means. When we meet the basic assumptions of the ANOVA test, Microsoft Excel is a very handy tool for analyzing the test results. Unfortunately, Excel’s ANOVA tool has its limitations. It lacks built-in tools to test the assumptions of normality and homogeneity of variance. It does not report either etasquared, η2, or Cohen’s f effect size, which are simple calculations. It does not have built-in a priori or post hoc statistical power functions, but as shown, a custom Excel workbook can calculate post hoc power. Statistical power calculations, however can be easily run using free tools like G*Power and online power calculators like the one found on the Statistical Kingdom’s website. Excel lacks the ability to run post hoc analysis, but setting up a post hoc analysis using least square difference (LSD) confidence intervals is not difficult.
VII. Exercises
Conduct One-Way ANOVA tests for the following problems using Microsoft Excel. The data are available in Chapter 16_Exercises.xlsx. Using the templates found in Chapter16_ANOVA_Power_PostHoc.xlsx, estimate and interpret eta-squared and Cohen’s f Effect Size, post hoc statistical power, and the probability of committing a Type II error. Calculate a priori power using G*Power or the online calculator found at on the Statistics Kingdom website, https://www.statskingdom.com/33test_power_regression.html. Perform a post hoc analysis when necessary using LSD confidence intervals.
Exercise 1:
1) Test Set-Up
General Cereals is a large manufacturer of pre-sweetened, ready-to-eat children’s cereals. One of its oldest brands is Super Golden Sweeties. As part of its marketing efforts, General Cereals has been running a test market for new package designs for Super Golden Sweeties. The test ran for 50 weeks in small test markets. Table 24 shows the results based on cases sold per week in the test markets for each of the four package designs:
Table 24: Exercise 1 Data
2) Using either G*Power or Statistical Kingdom’s online power calculator, run an a priori power analysis. Comment on your findings.
Use these estimated effect size:
Table 25: Estimated Effect Sizes
3) Select a Level of Significance, α
The level of significance has been set at 5 percent.
4) State the Null and Alternate Hypotheses
5) Compose a Decision Rule
6) Calculate the Value of the Test Statistic, p-Value, and effect size. Calculate Statistical Power. Conduct a post hoc analysis.
7) Decide and Report
Exercise 2:
1) Test Set-Up
Pützmeister Schwing Ltd. is a large multinational pharmaceutical company headquartered in Transylvania. The company has been developing an over-the-counter medication to help students improve their memory and focus so they can perform better on exams. As part of the test, 177 accounting students attending Nunya School of Business were randomly assigned to one of three treatment groups. The first group was given a placebo. The second group was given formula Q1 and the third received formula Z1. Participants in each group were given one week to prepare for a standardized accounting examination. Possible scores on the exam range from a low of zero to a high of 100. The test results are shown in Table 26:
Table 26: Exercise 2 Data
2) Using either G*Power or Statistical Kingdom’s online power calculator, run an a priori power analysis. Comment on your findings.
Use these estimated of effect size:
Table 27: Estimated Effect Sizes
3) Select a Level of Significance, α
You decided to use a 5 percent significance level.
4) State the Null and Alternate Hypotheses
5) Compose a Decision Rule
6) Calculate the Value of the Test Statistic, p-Value, and effect size. Conduct a post hoc power analysis.
7) Decide and Report
Exercise 3:
1) Test Set-Up
Gotham Paint is testing three prototypes for a new house paint. The paints are labelled X, Y, and Z. The production department wants to determine how quickly the three different paints dry under real-world conditions. To that end, it randomly distributed the three paints to professional house painters. Members of the production staff went into the field with these painters to determine how long it took for the three paints to dry. Table 25 shows the results of this test:
Table 28 Exercise 3 Data
2) Using either G*Power or Statistical Kingdom’s online power calculator, run an a priori power analysis. Comment on your findings.
Use these estimated effect size:
Table 29: Estimated Effect Sizes
3) Select a Level of Significance, α
A 5 percent significance level has been selected.
4) State the Null and Alternate Hypotheses
5) Compose a Decision Rule
6) Calculate the Value of the Test Statistic, p-Value, and effect size. Conduct a post hoc power analysis.
7) Decide and Report
Except where otherwise noted, this Clear-Sighted Statistics is licensed under a
Creative Commons License. You are free to share derivatives of this work for
non-commercial purposes only. Please attribute this work to Edward Volchok.
Endnotes
1 Ronald A. Fisher, “Studies in Crop Variations. I. An Examination of the Yield of Dressed Grain from Broadbalk,” Journal of Agricultural Science. Vol. 11, No. 2, 1918, pp. 107-135.
2 John H. McDonald, “Kruskal-Wallis test,” Handbook of Biological Statistics, http://www.biostathandbook.com/kruskalwallis.html. .
3 Low powered tests can exaggerate ES with they do detect an effect. Jim Frost, Hypothesis Testing: An Intuitive Guide for Making Data Drive Decisions. (State College, PA: Statistics by Jim Publishing) p. 123.
4 Jacob Cohen, Statistical Power Analysis for the Behavioral Sciences, 2nd Edition. (Mahwah, NJ: Lawrence Erlbaum Associates, 1988).
5 There was a real Muriel Bristol (1888-1950). Dr. Bristol was a botanist. There is even a genus of green algae name after her, Muriella. Dr. Bristol worked with Ronald A. Fisher at the Rothamsted Experimental Station in Harpenden, England. One afternoon, at a social gathering, Fisher offered Dr. Bristol a cup of hot tea drawn from an urn. She declined to drink this tea claiming that she preferred her tea prepared with the milk poured into the cup first. Fisher scoffed and wondered whether she could really tell the difference. One gentleman, William Roach said, “let’s test her.” While the results of this test were never recorded, eye witnesses reported that Dr. Bristol had an uncanny ability to determine whether the milk was poured into the cup before or after the tea. Bristol later married William Roach. Joan Fisher Box’s reported this event in her biography of her father, R. A. Fisher: The Life of a Scientist, (Hoboken, NJ: John Wiley & Sons, 1978) as did David Salsburg’s, The Lady Tasting Tea, (New York: Henry Holt and Company, 2001), pp. 1-8. Fisher briefly mentioned this incident in the context of designing an experiment at the start of the second chapter in his book, The Design of Experiments, (London: Oliver & Boyd, 1935), p. 14. Location 309 in the Kindle edition.