Chapter 15_Two-Sample NHST | Chapter 15: Two-Sample Null Hypothesis Significance Tests

Clear-Sighted Statistics

Chapter 15: Two-Sample Null Hypothesis Significance Tests

Figure 1: The NHST Cycle

NHST Cycle

I. Introduction

We covered one-sample hypothesis tests in Chapter 14. In Chapter 15, we move on to twosample hypothesis tests. In doing so, we will review six different statistical significance tests using z-distributions, t-distributions, and F-distributions.

After completing this chapter, you will be able to:

• Distinguish between independent and dependent or conditional samples.

• Conduct a z-test for two independent means.

• Conduct a z-test for two independent proportions.

• Conduct a F-test for equality of variance in preparation for selecting the appropriate two-sample t-tests for independent means.

• Conduct a t-test for two independent means assuming equal or pooled variance.

• Conduct a t-test for the two independent means assuming unequal variance.

• Conduct a t-test for the means for two dependent or paired samples.

• Measure the size or magnitude of the effect.

• Calculate a priori statistical power using G*Power and the calculators found on the Statistics Kingdom website. Here are the links:

- G*Power: https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower.

- Statistics Kingdom: https://www.statskingdom.com/sample_size_all.html.

• Use p-values and critical values tables for the normal distribution, student-t distribution, and F distribution to draw conclusions about the results of your null hypothesis significance tests.

You should download the following files that accompany this chapter:

• The critical values tables for z-distributions, t-distributions, and Fdistributions:

- z-Values_AreasBetweenMean&X.pdf.

- z-Values_AreasBetweenMean&X.xlsx.

- z-Values_CriticalValues_p-values.pdf.

- z-Values_CriticalValues_p-values.xlsx.

- Student-t_tables.pdf.

- Student-t_tables.xlsx.

- F-Distribution_.05_.01.pdf.

- F-Distribution_.05_.01.xlsx.

- Finding_CVs_p-values.xlsx.

• Chapter15_Examples.xlsx:

- The Excel workbook for the examples shown in this chapter.

• Chapter15_Exercises.xlsx:

- The Excel workbook for the exercises at the end of this chapter.

• Equality of Variance calculator:

- Chapter15_F-Test Equality of Var.xlsx.

II. Independent and Dependent Samples

A preliminary question we must address when conducting two-sample hypothesis tests is whether the two samples are independent or dependent. This is a critically important question because we use different techniques depending on the answer to this question.

Independent samples are unrelated to one another; the measurements from one sample have no influence on the measurements from the other. If a researcher wanted to compare the average daily commute time for residents of New York City to residents of Los Angeles, for example, the two sets of data are considered independent samples because values from one city do not affect values in the other.

Dependent sample are paired measurements for one set of items. When working with dependent samples, we pair or match the measurements from one sample to related measurements from a second sample. A “before and after” test is the classic example of two dependent samples. Two measurements are taken, one before some “treatment” is applied to the sample and another after the treatment has been applied. Another type of dependent sample is when you match or pair the measurement of one sample to the measurement of a second sample; for example, a random sample of patients’ blood pressure readings using two different kinds of blood pressure monitors to determine whether the devices provide consistent results.

III. Tests of the Means of Two Independent Samples Using z-values

The first two-sample test we will cover is the z-test for two independent means. This test has three requirements:

1. The two samples must be independent.

2. Both samples must have 30 or more observations, and

3. The population variances, σ2, or the population standard deviations, σ, are known.

We should only use this test when all three requirements are met. When these requirements are not met, we may be able to use one of the two two-sample t-tests for the mean assuming the samples are independent. Like the one-sample z-test for the mean, two-sample z-test for the means are used infrequently because the population variances are usually unknown. If the samples are dependent, we would use a paired or dependent sample t-test.

Step 1: Test Set-up

For many years, Dr. V. has taught classes for the same course at 9 am and 11 am. Over the years, he has noticed a pattern: Students in his 11 am classes do better on the first examination than students in his 9 am classes. Dr. V. decided to investigate. His research question: Do students in the 9 am classes have lower average grades on the first examination than their counterparts in the 11 am classes? Based on this research question, this will be a left-tail test assuming that the 9 am class is listed before the 11 am class in the null and alternate hypotheses.

Based on passed investigations, he believes the population variance, σ2, is the 9 am classes is 350 and 250 for the 11 am classes. The population standard deviation, σ, for the 9am classes is 18.71 and 15.81 for the 11 am classes. Based on this belief, he will conduct a two-sample z-test for the means.

Dr. V’s next step is to run an a priori statistical power calculation to determine the necessary sample size. He uses the sample size calculator found on the Statistics Kingdom website because G*Power does not have a calculator for this test.

Statistics Kingdom’s sample size calculator for this test has nine inputs:

1. Tails: There are three options: Two, Left, and Right. Based on the research question, select Left.

2. Rounding: The default is 6. This is the number of digits that the answer will be rounded. Leave it at 6 because entering other numbers does not seem to make a difference.

3. Distribution: There are two choices: Normal and T. select Normal because we will conduct a z-test.

4. Samples: There are two choices: One Sample and Two samples. Select two samples because we will be using data from two samples.

5. Significance level (α): Enter 0.05 because we will be using a 5 percent significance level.

6. Power: Enter 0.8, because we seek 80 percent power and a 20% probability of a Type II error.

7. Effect: There are three options: Small, Medium, and Large. Select small.

8. Effect Type: There are two options: Standardized Effect size (Cohen’s d) and Unstandardized effect size, which is sampling error or the difference between the two sample means. Select Standardized effect size.

The most commonly used effect size for two-sample tests of means is Cohen’s d. Given that Dr. V has yet to collect data, he will have to estimate the value of Cohen’s d. This estimate can be based on reviewing previous research or the effect size deemed necessary to achieve practical significance. Equation 1 shows the formula for calculating Cohen’s d along with the formula for pooled standard deviation:

Equation 1: Formula for Cohen’s d

Table 1 shows how Cohen’s d effect sizes are interpreted:

Table 1: Professor Cohen’s Guidelines for Interpreting Cohen’s d Effect Size

Based on this table, a small effect size must be at least 0.20, a medium effect size must be at least 0.50, and a large effect size must be at least 0.80. In the social sciences and business, effect sizes tend to be small.1 An effect size less than 0.20 is considered a negligible or minimal effect. Whenever the null hypothesis is rejected and the effect size is negligible, we should be concerned that our test may be over-powered and the results may lack practical significance. As we have said, practical or clinical significance means that the research findings can change our daily practices.

9. Effect Size: Based on his experience, Dr. V’s estimates that the Cohen’s d effect size will be 0.39, which is still a small effect. Enter 0.39.

Figure 2 shows the inputs in the Statistics Kingdom sample size calculator:

Figure 2: Statistics Kingdom A Priori Statistical Power Inputs

As shown in Figure 3, the necessary sample size to achieve 80 percent statistical power is 82 subjects for each of the two samples.

Figure 3: Results of the Statistical Kingdom A Priori Power Calculation

Using his gradebooks, he takes a random sample of 82 students from his 9am classes and 82 from his 11am classes. Tables 2 and 3 show the results from the two samples:

Table 2: Random Sample of Grades on the First Examination from the 9am Class

Table 3: Random Sample of Grades on the First Examination from the 11am Class

Table 4 shows the data summaries for the 9am and 11am samples:

Table 4: Summary of Performance on the First Examination

To recap, Dr. V selected a two-sample z-test for the means for the following reasons:

1. The samples are independent because a student cannot be registered in both classes nor can a student’s performance on an examination be affected by the performances of students in the other class.

2. Both samples have 30 or more observations.

3. There is a good estimate of the population variance for both samples.

Step 2. Select the Level of Significance, α

Dr. V. selects a 5 percent significance level, which is typically used by social scientists. This significance level means that over the long-run there is a 5 percent probability of committing a Type I error; which is to say, getting a false positive, which means rejecting the null hypothesis when it should not be rejected.

The common practice of selecting a 5 percent significance level is called the five-eighty convention.2 The five stands for the five percent tolerance of a Type I or α error, and the eighty stands for the desired level of statistical power. Statistical power is defined as one minus the probability of a Type II error. Type II or β error, you will recall, are the result of the failure to reject a null hypothesis when there is an effect. It is a false negative. The five-eighty guideline balances the risk of committing a Type I or Type II error. We are more tolerant of Type II errors than Type I errors. This is because a false positive is usually judged to be a more serious error than a false negative. Remember: While the probabilities of committing a Type I or Type II error are inversely related, these errors are mutually exclusive. We cannot commit both errors on the same test. Similarly, we cannot commit a Type II error when the null hypothesis is rejected or a Type I error when the null hypothesis is not rejected.

Using Microsoft Excel, the critical value of z for a left-tail test with a 5 percent significance level is -1.645. See Figure 4 for the Excel formulas:

Figure 4: Critical Value of z = -1.645

Step 3. State the Null Hypothesis (H0) and Alternate Hypothesis (H1)

Given the research question—Do students in the 9 am classes have lower average grades on the first examination than their counterparts in the 11 am classes?—this is a left-tail test. Here are the null and alternate hypotheses:

H0: μ9 ≥ μ11; H1: μ9 < μ11

Please note: If we worded the research question as “do students in the 11 am classes have higher average grades on the first exam than their counterparts in the 9 am classes,” we would have a right-tail test. The order of the samples is important. By switching the order of the samples and the mathematical symbols, we would have a right-tail test:

H0: μ11 ≤ μ9; H0: μ11 > μ9.

The critical value for a right-tail test with a 5 percent significance level would be +1.645.

As discussed in Chapters 13 and 14, the null hypothesis means no significant difference. Any differences found are attributed to random sampling error.

The alternative hypothesis means that there is a statistically significant difference between the two samples. In this example: Students in the 9 am classes do not perform as well as students in the 11 am class on the first exam. The less than sign, <, in the alternate hypothesis, indicates that this is a left-tail test because it points to the left-tail or lower tail of the z-distribution.

It really does not matter whether we set up this test as a left-tail or right-tail test. What matters is that once we choose between a left-tail or right-tail test, the test design remains consistent with the chosen direction of the test.

Step 4. Compose the Decision Rule

Given that this is a left-tail test at a five percent significance level, the rejection region is the lowest 5 percent of the left-tail. Using the critical values table, the critical value for a left-tail z-test at a 5 percent significance level is -1.65. Using Microsoft Excel, the z-value is -1.644853626951, which we round off to -1.645.

The decision rule:

Reject the Null Hypothesis if z is less than -1.645 (or -1.65).

Figure 5 shows a chart of the normal curve with a 5 percent rejection region on the left-tail.

Figure 5: Left-tail z-test with a 5% significance level

Step 5. Calculate the Value of the Test Statistic, p-value, and Effect Size

Equation 2 shows the formula for a two-sample z-test for the means and the calculations for this problem:

Equation 2: Test Statistic for a Two-Sample z-test for the Mean

The value of the test statistic is -2.368. Using the Area Under the Curve tables, we can get a good estimate the p-value for our test by rounding our z-value to -2.37. Using the Area Under the Curve Table for the area between the mean and z, the estimate of the p-value is 0.0089 or 0.89 percent, found by 0.500 – 0.4911. Using the Area Under the Curve Table that measures the area between z and the tail will return the same estimated p-value, 0.0089. See Figure 6:

Figure 6: Estimating p-value using the Area Under the Curve Tables

A faster way to conduct this hypothesis test is to use the Microsoft Excel Data Analysis ToolPak. On the Excel ribbon, select Data and then Data Analysis. Depending on the version of Excel you are using, the icon for this tool looks like the ones shown in Figure 7. The icon on the left is from the Macintosh version of Excel 2016 and 2019. The icon on the right is from the Windows version of Excel 365.

Figure 7: Excel’s Data Analysis tool on the “ribbon”

Click on the Data Analysis icon. Doing so will bring up the Data Analysis window. Scroll down to the bottom of the list and select z-Test: Two Sample for Means. This is the last option on this list. See Figure 8:

Figure 8: Select z-Test: Two Sample for Means

Select z-Test Two Sample for Means

Once you select this option, a new window will open as shown in Figure 9. This window allows you to enter the necessary information for Excel to conduct this test.

Enter the Variable 1 and Variable 2 Ranges. This is done by dragging the cursor through the cell range. The Hypothesized Mean Difference is 0. Enter the value of the variance for Variable 1 and 2. Check the Labels box because Cells B1 and C1 contain the variable labels. Under Alpha, enter the significance level, 0.05. And, select the Output option.

Figure 9: Data input window for a “z-Test for Two Sample for Means”

Under Variable 1 Range, enter the cell reference for the 9 am sample. Under Variable 2 Range enter the cell reference for the 11 am sample. This is done by dragging the cursor through the cell range. The Hypothesized Mean Difference is zero. Variable 1 Variance (known) equals 350, and Variable 2 Variance (known) is 250. The Labels box is checked because cells B1 and C1 contain the sample names. Alpha stands for the level of significance. Enter 0.05. Excel has three options for where to place the output: 1) the current worksheet, 2) a new worksheet, or 3) a new workbook. The selected output option is to place the results in the same worksheet as the data starting in cell J1. Then click OK.

In just a few seconds, Excel provides results of our test. See Figure 10. Included in this report are the two means, the known variances, the number of observations for each sample, the value of z, -2.368, and the p-values and critical values for one-tail and two-tail tests. Remember: p-values are probabilities. They can never be negative. The p-value represents the probability of committing a Type I error. When the p-value is greater than the significance level, we do not reject the null hypothesis. When the p-value is equal to or less than the level of significance, we reject the null hypothesis. Our confidence in our decision to reject the null hypothesis grows as the p-value shrinks.

Figure 10: Excel’s Solution for this two-sample z-test for the means

It should also be pointed out that our failure to reject the null hypothesis does not mean it is true. Similarly, when we reject the null hypothesis, we do not consider the alternate hypothesis true.

Unfortunately, Excel’s z-Test: Two Sample for Means tool has major shortcomings:

1. It fails to report effect size.

2. It does not calculate the population variances for you, so you have to know these parameters before you conduct this test.

3. Answers for the z-value, critical values, and the p-values are not rounded.

The good news is that we can use Excel to do all the calculations with the exception of finding a priori statistical power. Figure 11 shows the test calculations performed using Excel’s standard functions:

Figure 11: Test Run Using Excel’s Built-In Functions

Note on Cohen’s d

As previously stated, the most commonly used effect size for two-sample tests of means is Cohen’s d. As previously stated, we can find the effect size using the following equations:

Equation 3: Formula for Cohen’s d

Equation 4: Formula for Pooled Standard Deviation

Equation 5: Calculation for Cohen’s d

How do we interpret our calculated Cohen’s d effect size of 0.5230? Table 5 shows the Cohen’s d “threshold” values. 3 Based on this table, the calculated Cohen’s d effect size is a medium effect.

Table 5: Professor Cohen’s Guidelines for Interpreting Cohen’s d Effect Size

Given the fact that the value of Cohen’s d is greater than the estimate used in the a priori statistical power calculation, this test will have greater than 80 percent statistical power. We are not concerned that this test is over-powered because the effect is not negligible.

Step 6. Decide and Report

Because the z-value of -2.368 is less than the critical value of -1.645 and the p-value of 0.89 percent (the area in red in the chart shown in Figure 12) is well below the 5 percent significance level, there is sufficient evidence to reject the null hypothesis at either a 5 percent or 1 percent significance levels. This indicates that it is unlikely that the 9 am and 11 am classes performed equally on the first test. Conclusion: On average, students in the 9am class have lower grades on the first exam than students in the 11am classes.

Figure 12: Graphic representation of the left-tail test at a 5% significance level with a p-value of 0.27%

Practical Significance: The test results also have practical significance given the effect size, 0.5230. In addition, most college administrators, professors, and students would contend that the 6.405 point difference in a test score, the difference between a 76.9, a C,) and an 83.3, a B, is quite important and has real meaning.

III. Tests of the Proportions of Two Independent Samples Using z-values

In Chapter 11 we constructed confidence intervals for proportions using the October 2019 Gallup poll data on the attitudes of Republicans, Independents, and Democrats on the legalization of marijuana. We will now conduct a two-sample test for proportions using the 2019 Gallup data. Given that we can only compare two samples with this z-test, we will compare the proportions of Republicans and Independents who favor the legalization of marijuana.

If we want to test Democrats, Republicans, and Independents, we would have to run three tests: Republicans vs. Independents, Republicans vs. Democrats, and Independents vs. Democrats. There is, however, a serious problem doing this. The probability of a Type I error would increase to 14.26 percent, found by 1-(0.95*0.95*0.95).

Please note: Microsoft Excel’s Data Analysis ToolPak does not have a built-in twosample test for proportions. We can, however, use Excel built-in functions to perform all the calculations. We will use G*Power to estimate statistical power and the probability of a Type II error.

Step 1: Test Set-up

Using the October 2019 data from Gallup, we will designate the count of the number of people who favor the legalization of marijuana a “success” because they match the characteristic we are seeking to measure. Here are Gallup’s findings:

Table 6: Results from the October 2019 Gallup Poll on the Legalization of Marijuana

To repeat, we cannot use a z-test to compare more than two proportions simultaneously. We could use a chi-square test, but we will not cover chi-square tests until Chapter 17. With this z-test, all we can do is compare the proportions for two independent samples: Independents to Republicans: 50.64 percent of Republicans compared to 67.48 percent of Independents favor the legalization of marijuana. These proportions are estimates of the unknown population proportions, πR and πI. When we constructed confidence intervals in Chapter 11, we estimated the margin of error, or MoE, at a 95 percent confidence interval. The MoE for Republicans was plus or minus 4.94 percent and plus or minus 4.35 percent for Independents. The confidence intervals, which are considered inverse hypothesis tests, determined that the attitudes of Republicans and Independents were not equal because the confidence intervals did not overlap.

The research question: Is the difference in support for the legalization of marijuana between Republicans and Independents statistically significant? The phrasing of this question suggests a two-tail test, but we are not interested in whether there is a difference. We want to know whether Republicans are less likely to favor the legalization of marijuana than Independents. This would be a left-tail test. If we placed Independents first, we would conduct a right-tail test. We must be certain to clearly phrase our question when we write the null and alternate hypotheses, the decision rule, calculate the test statistic, and report our findings. Not doing so risks muddling our results. Structuring this test as a left-tail test, we expect the z-value to be negative.

Despite the fact that we already have our data, we should still run an a priori statistical power analysis to determine whether the Gallup survey has enough data to obtain the desired level of statistical power. For this analysis, we will use G*Power.

Here are the three settings for this calculation:

1. Test family: z-tests.

2. Statistical test: Proportions: Difference between two independent proportions.

3. Type of Power Analysis: A priori: Compute required sample size….

Here are the input parameters:

1. Tail(s): There are two options: one or two. Select one because is a left-tail test.

2. Proportion p2: This is the proportion for Independents. Enter 0.6748.

3. Proportion p1: This is the proportion for Republicans. Enter 0.5064.

4. α err prob: This is the significance level. Enter 0.05.

5. Power (1-β err prob): This is the desired level of statistical power. Enter 0.8.

6. Allocation ratio N2/N1: Enter 2.23, found by the sample size or Independents over the sample size for Republicans, 444/199.

As shown in Figure 13, the results of the G*Power’s calculation indicate that to achieve 80% statistical power the sample should include 75 Republicans and 167 Independents.

Figure 13: G*Power A Priori Power Calculation

The Gallup poll survey has much larger sample sizes. We will, therefore, have ample statistical power. But, if the calculated effect size is negligible, this test may be over-powered. This means that the probability of not rejecting the null hypothesis is too low; which is to say, it is virtually impossible not to reject the null hypothesis.

Step 2. Select the Level of Significance, α

A 5 percent significance level is selected. The critical value of z is -1.645.

Figure 14: Critical Value of z = -1.645

Step 3. State the Null Hypothesis (H0) and Alternate Hypothesis (H1)

Here are the null and alternate hypotheses for our left-tail test:

H0: πR ≥ πI; H1: πR < πI

As always, the null hypothesis means no difference. In this case, the difference between the proportion of Republicans and Independents who favor the legalization of marijuana is statistically insignificant; that is, it is merely due to sampling error. Our alternative hypothesis means that there is a statistically significant difference between the attitudes of Republicans and Independents on the legalization of marijuana; or, the difference between 50.64 percent and 67.48 percent is greater than what we would expect if the difference were merely sampling error. In short, we are investigating whether the results of the latest Gallup poll provide evidence that fewer Republicans favor the legalization of marijuana than Independents.

Step 4. Compose the Decision Rule

The Area Under the Curve table shows that the critical value for a left-tail test with a 5 percent significance level is 1.65. But, because the tables are for the right side of the symmetrical normal curve, and we are testing on the left side, we must include the negative sign, -1.65. Excel will provide a more precise answer, -1.645.

The decision rule: Reject the null hypothesis if z is less than -1.645 (or -1.65). Figure 15 shows a chart of the normal curve with a five percent rejection region on the lefttail.

Figure 15: Left-tail test with a 5% significance level

Step 5. Calculate the Value of the Test Statistic and p-value

Calculating the test statistic for a two-sample z-test for proportions requires two steps:

# 1: Calculate the Pooled Proportion. Equation 6 shows the formula and calculation:

Equation 6: Equation for pooled proportions

# 2: Calculate the z-value. Equation 7 shows the formula and calculation:

Equation 7: Test Statistic for a One-Sample z-test for the Mean

A z-value of -5.520 is very extreme, over 5.5 standard errors of the proportion below the hypothesized population proportion. In fact, it is so extreme that you cannot find it on the Area Under the Curve Tables. You will recall, the Normal or Empirical Rule states that nearly all values are plus or minus three standard deviations or standard errors. Our test statistic or -5.520 indicates that we are over 5.52 standard errors of the proportion below a z-value of 0.00. The standard error of the proportion, SEP, is the substitute for standard deviation when dealing with proportions.

Microsoft Excel can calculate the p-value with the following formula:

Equation 8: Excel’s Formula For Calculating p-values for a z-test

= 1-NORMSDIST(-5.50) = 0.00000002

This is an extremely tiny p-value. We report tiny p-values like this as <0.001.

To calculate effect size, we will use Cohen’s h.4 Equation 9 shows the formula:

Equation 9: Formula for Cohen’s h

If you forgot your high school trigonometry, arcsin is the inverse of the sine function. You can calculate arcsin using Excel’s ASIN function:

Equation 10: Excel’s ASIN Function

=ASIN(Number)

Figure 16 shows all the calculations for this test including Cohen’s h effect size, which is 0.35. This figure indicates that we have uncovered a small effect relating to the impact of party identification and attitudes toward the legalization of marijuana. Yes, the difference of 17.15 percentage points between Republicans’ and Independents’ attitudes towards the legalization of marijuana is only a small effect. Given that the effect size is not negligible we need not be concerned about whether this test is over-powered.

Figure 16: Calculation of Cohen’s h along with the rest of the calculations for this test.

Step 6. Decide and Report

Given the extreme z-value of -5.520 and the exceedingly small p-value of <0.001, we reject the null hypothesis. There is far less than a one in a thousand chance that the different attitudes on the legalization of marijuana exhibited by Republicans and Independents is merely sampling error. The difference, therefore, is statistically significant. Given the effect size, we also conclude that the results have practical significance for policymakers and entrepreneurs. Conclusion: Republicans are less likely to favor the legalization of marijuana than Independents.

Please note: The two-sample test of proportions is of limited use because we are restricted to comparing only two samples at a time. Comparing confidence intervals for Republicans, Independents, and Democrats allows us to compare the attitudes toward the legalization of marijuana for all three samples in one analysis. Remember: Confidence intervals are the inverse of hypothesis tests. Many statisticians argue that confidence intervals are more informative than hypothesis tests like this one.

IV. Tests of the Means of Two Independent Samples Using t-values

When we want to test the difference between two independent sample means, we usually do not meet the requirements for a z-test for two independent means. We may not have a good estimate of the population variance or population standard deviation for both samples. Or, one or both of the samples may have fewer than 30 observations. In such cases, we cannot use a z-test. We may be able to use a two-sample t-test for independent samples, however, the use of twosample t-test requires that the data be normally distributed or symmetrical around the mean. The t-test, however, is considered a robust test that remains useful even when this assumption is violated to a moderate extent. In other words, the two samples can be less than perfectly symmetrical. If the data for each of the two samples are heavily skewed, however, we would have to use a nonparametric test like a Mann-Whitney U test, which is not covered in Clear-Sighted Statistics.

There are actually two t-tests for means for independent samples: One for samples with roughly equal sample variances. This test is often called a pooled-variance t-test. Excel calls this test “t-Test: Two-Sample Assuming Equal Variances.” The other two-sample t-test is for samples with unequal sample variances. This test is sometimes called Welch’s t-test. Excel calls this test “t-Test: Two-Sample Assuming Unequal Variance.”

The decision about which two-sample t-test to use is based on the equality of the sample variances. To determine whether the variances of the two samples are equal, we conduct an F-test for equality of variance. Excel’s Data Analysis ToolPak has a built-in equality of variance test called “F-Test Two-Sample for Variances.” In the following examples, we shall conduct this test two ways:

1. Using Excel’s Data Analysis ToolPak, and

2. Using Excel’s standard functions and a series of IF statements.

We shall see that the test created using Excel’s standard functions and IF statements has a major advantage over Excel’s Data Analysis Toolpak’s F-Test Two-Sample for Variances. This test can also be easily completed with calculations done with paper and pencil or a handheld calculator. Although calculating the sample variances by hand can be time-consuming.

A. F-Test for the Equality of Variance

F-tests are based on the F-distribution. Like z and t distributions, there is a “family” of Fdistributions. F-distributions are based on the degrees of freedom in the numerator and denominator. For the equality of variance test, degrees of freedom are found by the sample size, n, minus 1. The critical value of F is the ratio between the degrees of freedom in the numerator over the degrees of freedom in the denominator. See Equation 11.

Equation 11: Degrees of Freedom Equation for an F-Test for Equality of Variance

Like z- and t-distributions, F-distributions are continuous. The t-distribution and Fdistribution are closely related; , F-values are t-values squared; or stated another way, tvalues are the square root of F-values

Equation 12: The Relationship Between F and t

But unlike z- and tdistributions, F-distributions are always positively skewed. And, like zdistributions and t-distributions, F-distributions are asymptotic; that is, while the curve gets close to the Xaxis, it never touches this axis. As the degrees of freedom increase, the Fdistribution curve becomes less skewed. Here are four graphic representations of Fdistributions with different degrees of freedom in the numerator and denominator:

Figure 17: Four Graphic Examples of F-Distributions

Appendix 2: Statistical Tables has the critical values tables for F-distributions, one for a 5 percent level of significance and the other for a 1 percent. You can also find these tables in FDistribution_.05_.01.xlsx and F-Distribution_.05_.01.pdf. These tables are big and there are missing values for some combinations of degrees of freedom. This is one reason why finding the critical values for F is recommended.

Figures 18 and 19 show abbreviated critical value tables for the F-Distribution at a 5 percent and a 1 percent significance level.

Figure 18: Critical Value Table for F at a 5% significance level

Critical Value Table for F at a 5% significance level

Figure 19: Critical Value Table for F at a 1% significance level

Critical Value Table for F at a 1% significance level

You can also use Microsoft Excel to find critical values using this formula:

Equation 13: Excel’s Formula for Calculating the Critical Value of F

=FINV(significance level, df in numerator, df in denominator)

Finding the critical values of F using Excel has two major advantages over using the “paper” tables:

1. You are not restricted to using 5 percent or 1 percent significance levels, and

2. You can find the critical value for any combination of degrees of freedom.

Equation 14 shows the formula for the test statistic for the Equality of Variance test:

Equation 14: Equality of Variance Test Statistic

Please note: To keep the critical values tables manageable, we always place the larger sample variance in the numerator. This forces the test to be a right-tail test. Consequently, the smallest possible F-value is 1.00. If your F-value is less than 1.00, you made the mistake of placing the smaller variance in the numerator. Please Note: This mistake can easily happen when you use Excel’s Data Analysis ToolPak.

The research question is: Is variance for Sample A (the sample with the larger variance) greater than that for Sample B? The null and alternate hypotheses are written as:

Remember: Null hypotheses tests always pertain to a population parameter. With the equality of variance test, we are interested in determining whether the two population variances are equal; but because F-Distributions are always positively-skewed or rightskewed, the F-test for equality of variance should be written as a right-tail test.

Here is how to find the critical value for F using the critical values table. We find the intersection of degrees of freedom in the numerator with degrees of freedom in the denominator. Degrees of freedom for this F-test are found by n minus 1 for the numerator and n minus 1 for the denominator. The sample with the larger variance always goes in the numerator. Let’s say Sample 1 has a variance of 25 and a sample size of 17 and Sample 2 has a variance of 16 and a sample size of 18. Sample 1 goes in the numerator and Sample 2 goes in the denominator. The critical value at a 5 percent significance level is 2.38 based on 16 degrees of freedom, found by 17 minus 1 in the numerator, and 15 in the denominator, found by 16 minus 1. See Figure 20.

Figure 20: Critical value for F(16,15) at a 5% significance level is 2.38.

Critical value for F(16,15) at a 5% significance level is 2.38.

The decision rule: Reject the null hypothesis if F is greater than 2.38.

The test statistic is:

Equation 15: Equality of Variance F-Test

Unfortunately, we cannot estimate the p-value using the critical values tables. We must use Microsoft Excel. Equation 16 shows the formula for calculating p-values:

Equation 16: Excel’s Formula for Calculated p-Values for F-Distributions

=FDIST(F-value, df numerator, df denominator)

The p-value F(16,15) is 0.2386 or 23.84 percent. With a p-value greater than the 5 percent significance level, we would fail to reject the null hypothesis. We conclude that there is insufficient evidence that the two samples have unequal variances. This is an important finding. It tells us to use a pooled variance (equal variances) t-test, not the t-test for unequal variances.

You can conduct this test using the FTest for Equality of Variance template found in Chapter15_Examples.xlsx on the worksheet titled Chapter15_F-Test Equality of Var.xlsx. Just enter the significance level, the sample standard deviations, and sample sizes, and the template will automatically place the larger variance in the numerator, calculate the F-value, p-value, tell you whether the null hypothesis is rejected, and which of the two t-tests to use. Open this template, and enter values in the “red” cells to see how it works. Look at the formulas in each cell. Figure 21 shows the template for this problem.

Figure 21: F-Test for Equality of Variance template found in Chapter15_2-Sample)_NHST.xltx.

When the summary statistics are not available, you can use Excel’s built-in function to count the number of variables and calculate sample variance and standard deviation.

Based on the data entered, we would fail to reject the null hypothesis. We conclude that there is no significant difference in the sample variances. This means that we should use a pooled variance t-test, or what Excel calls a “t-Test: Two-Sample Assuming Equal Variance.” If we were to reject the null hypothesis, the variances would be unequal and we would have to use an unequal variance t-test, or what Excel calls a “t-Test: TwoSample Assuming Unequal Variance.” Please note: Unequal Variance t-test reduces the degrees of freedom. Thereby lowering statistical power compared to the pooled variance ttest.

If you have the raw data, you may also conduct an F-test for equality of variance using Excel’s Data Analysis plug-in. We will examine Excel’s F-test in the context of the two two-sample t-tests for the means. We will also discover this plug-in has a serious flaw: It does not automatically place the larger variance in the numerator. This error is apparent when for calculated value for F is less than 1.00.

B. t-test for Means of Two Independent Samples Using t-values (Pooled Variance Test)

Step 1: Test Set-up

The Accounting faculty at the Nunya School of Business is testing a new introductory accounting textbook. The Nunya Business faculty assign the new textbook to half the Accounting 1 classes and the current textbook to the other half. Students in both classes took the same standardized test. The faculty randomly selected the exam scores for 115 students in classes that use the current textbook and 115 test scores of students in classes using the new textbook. These two samples are independent because a student cannot be registered in more than one Accounting I class. The question faculty want answered: Do students using the new textbook perform better than those using the current textbook. This test, therefore, will be a right-tail test.

Because there is no good estimate of the population variances, a two-sample z-test for the means cannot be used. But, which two sample t-tests should be used? To answer this question, they must conduct an F-test for equality of variance. The research question: Are the variances equal? The null and alternate hypotheses are:

Please Note: We did not perform an a priori statistical power analysis for this test. We will, however, conduct one for the t-test. For the t-test we will estimate the size of Cohen’s d at 0.33, which is a small effect.

Figures 22 and 23 show the exam scores from the classes using the new and the current textbooks:

Figure 22: Test scores from classes using the current textbook

Figure 23: Test scores from classes using the new textbook

We will use Excel’s F-test for equality of variance using Excel’s Data Analysis ToolPak plug-in. Do the following steps:

# 1: Click on the Data Analysis icon in the ribbon:

Figure 24: Click on Data Analysis icon

# 2: Select F-Test Two-Sample for Variance and Click OK.

Figure 25: F-Test Two Sample for Variances

F-Test Two Sample for Variances

# 3: Enter data in the Input window and Click OK.

Enter the Variable 1 and Variable 2 Ranges. This is done by dragging the cursor through the cell range. Check the Labels box because Cells B1 and C1 contain the variable labels. Under Alpha, enter the significance level, 0.05. And, select the Output option.

Figure 26: Input window for F-Test Two-Sample for Variances

# 4: Read the results and make a decision.

Figure 27: Results for F-Test Two-Sample for Variances. Note: Error

The nice thing about Excel’s F-Test Two-Sample for Variance tool is that it very quickly calculates the degrees of freedom, the critical value of F, the test statistic, and the pvalue. But, as previously mentioned, it is prone to a rather foolish error that is a serious shortcoming. In this case, we see that F is less than 1.00, which is impossible when we follow the convention of placing the larger variance in the numerator. This error stems from the fact the we placed the sample with the larger variance in the Variable 2 Range, not Variable 1 Range. This is an easy mistake to make because we did not calculate the variances for the two samples in advance. The ttests in 15_Examples.xlsx fixes this problem by using a series of IF statements so the larger variance is always in the numerator. It even tells us which of the two t-tests to use. See Figure 28:

Figure 28: Equality of Variance F-Test Performed Properly in Excel

Based on the failure to reject the null hypothesis, we will use a pooled variance t-test.

We should now run the a priori statistical power calculations using G*Power or Statistics Kingdom. Here are the settings for the G*Power calculation:

Test family: T-tests.

Statistical Test: Means: Difference between to independent means (two groups).

Type of power analysis: A priori: Compute required sample size….

Tails: One.

Effect size d: 0.33. (Note: This is an estimate, which can be made based on reviewing similar studies or experience.)

α err prob: 0.05

Power (1-β err prob): 0.8

Allocation ratio: N2/N1: 1

Figure 29 shows that to achieve 80 percent power, both samples should have 115 students in each of the two samples assuming a Cohen’s d effect size of 0.33.

Figure 29: G*Power A Priori Statistical Power Calculation

The Statistics Kingdom analysis also shows that each of the two samples needs 115 observations to achieve 80 percent statistical power. See Figure 30.

Figure 30: Statistics Kingdom Priori Statistical Power Calculation

Step 2. Select the Level of Significance, α

This t-test will be conducted using a 5 percent significance level.

With this test we have 228 degrees of freedom found by adding the two sample sizes and subtracting the number of independent samples, 115 + 115 - 2 = 228 degrees of freedom. The critical value for t with 228 degrees of freedom, right-tail test with a 5 percent significance level is 1.652 found by using the Excel formulas shown in Figure 31:

Figure 31: Excel Formulas for Finding the Critical Value of t

You can also find this value using the student-t critical value table shown in Figure 31.

Figure 32: Critical Value for a One-Tail t Test with 228 DF at 5% = 1.652

Step 3. State the Null Hypothesis (H0) and Alternate Hypothesis (H1)

The question we want to answer is: Do students using the new textbook achieve significantly higher scores on the standardized accounting examination?

The null and alternate hypotheses are:

H0: μn ≤ μo; H1: μn > μo.

Where: μn is the mean for the new textbook and μo is the mean for the old textbook.

Step 4. Compose the Decision Rule

The decision rule: Reject the null hypothesis if t is greater than 1.652. Here is what the tdistribution looks like with a 5 percent rejection region at 228 degrees of freedom.

Figure 33: t-Distribution 5% right-tail test with 228 degrees of freedom

Step 5. Calculate the Test Statistic and p-Value

Our next step is to calculate the value of the test statistic and p-value. We will do this three ways:

A. By hand and we will estimate the p-value using the critical values table.

B. Using Excel’s Data Analysis tool.

C. Using Excel’s standard functions

A. Calculating the test statistic by hand

There are two steps when we calculate the test statistic:

# 1: Calculate Pooled Variance.

Equation 17: Pooled Variance Formula

Because the population variances are unknown, we create an estimate by pooling the sample variances. The pooled variance formula is a simple fraction. In the numerator, we multiply the degrees of freedom and sample variance from the first sample and add this figure to the product of the degrees of freedom and sample variance from the second sample. The denominator is the total degrees of freedom found by adding the two sample sizes and subtracting by 2. Here is the pooled variance for our problem:

Equation 18: Formula for Pooled-Variance

# 2: Calculate the test Statistic.

Equation 19 shows the formula for the test statistic and the calculation for this example:

Equation 19: Test Statistics for a Two-Sample t-Test with Equal Variance

Using the student-t critical values table, we can estimate the p-value by comparing the value of our test statistic, 9.219, and the critical values for a one-tail test, 1.652. As shown in Figure 34, the p-value is less than 0.0005 or 0.05%. Tiny p-values like this are reported as “p < 0.001.”

Figure 34: Estimating the p-value using the student-t critical values table

B. Using Excel’s Data Analysis Tool

Using Excel’s Data Analysis tool is much faster, more accurate, and less prone to error than calculating the t-value by hand. Unfortunately, this tool does not calculate a standardized effect size like Cohen’s d. Here are the three steps you need to follow:

# 1: Select t-Test: Two-Sample Assuming Equal Variance and click OK.

Figure 35: Data Analysis, t-Test: Two-Sample Assuming Equal Variance

Data Analysis, t-Test: Two-Sample Assuming Equal Variance

# 2: Enter the data in the data input window and click OK.

Figure 36: Data input window for a Two-Sample t-Test Assuming Equal Variance

Input window for F-Test Two-Sample for Variances

Step 3: Interpret Results and Make a Decision Regarding the Null Hypothesis.

Figure 37: Results for our Equal Variance t-Test

Excel yields a solution to this problem far faster than calculating this test by hand. With a test statistic of 9.219. Please Note: The p-values are not properly formatted because they are so small. Tiny p-values need to be reformatted using Excel. We report tiny p-values like these as <0.001. We will reject the null hypothesis. As previously pointed out, this tool fails to report effect size.

C. Using Excel’s Built-In Functions

To repeat, Excel’s Data Analysis function has shortcomings. The biggest failure is that it does not calculate Cohen’s d effect size. In the Excel file “Chapter15_Examples.xlsx” on the Example #3 worksheet, we calculated Cohen’s d. Effect size is 0.30, which is slightly less than our estimate of 0.33. This is not a problem, given the fact that the null hypothesis is rejected due to the extreme value of the test statistic. Figure 38 shows the calculation of the test statistic using Excel’s built-in functions along with effect size.

Figure 38: Cohen’s d Effect Size Calculation

Step 6. Decide and Report

Given the extreme z-value of 9.219 and the tiny p-value of <0.001, we reject the null hypothesis. Conclusion: The new textbook helps students achieve significantly higher test scores. TThe fact that average test scores when from 73.97, a C, with the current textbook to 85.68, a B, with the new textbook means that this test has practical significance in addition to statistical significance.

C. t-test for Means of Two Independent Samples Using t-values (Unequal Variance tTest, which is sometimes called a Welch Test)

Step 1: Test Set-up

Joe and Carl are twins with a severe case of sibling rivalry. They hate each other as much as they love freshly brewed espresso. They own competing espresso stands at the state’s largest mall: Jittery Joe’s Espresso Emporium and Caffeine Carl’s Espresso Deluxe. Their stands are at opposite ends of the mall. At dinner on Thanksgiving, the boys argued violently about whose espresso stand was more successful. Their mother, Muriel, hired you to determine who sells more espresso. When she hands you 66 days of sales for each of her sons’ stands, she insists that you provide your analysis within a few days. The research question: Is there a significant difference in average daily sales for the two espresso stands? The word “difference” indicates that this is a two-tail test. Two-tail test have less statistical power than one-tail test.

You have no estimate of the effect size. Such an estimate is needed to calculate a priori statistical power. You decide to look at the data before running an a priori power analysis in order to estimate effect size. Your concern is that if the effect size is negligible, statistical power will be far too low. Using a non-mathematical analogy of a criminal trial, you recall that prosecutors who have a high conviction rate typically refuse to bring cases with flimsy evidence to trial. You fear that these data are insubstantial evidence, and that a test is not warranted.

You decide to conduct an a priori statistical power calculation to determine the effect size needed to achieve 80 percent statistical power. Using a medium Cohen’s d effect size of 0.50, you discover than you would need at least 64 observations in each sample. See Figure 39.

Table 7: Interpreting Cohen’s d Effect Size

Figure 39: A Priori Statistic Power Calculation Using G*Power

Table 8 shows the daily dollar sales for the two stands:

Table 8: Daily Sales (rounded off to the nearest dollar) for Jittery Joe and Caffeine Carl

Table 9 provides the summary data:

Table 9: Summary Data

You call your client to tell her that the difference in average daily sales is less than $1.26 a day. As we shall see, the effect size will be extremely small. You contend that there is little point of conducting this test given this tiny difference in average daily sales. There will be no evidence to support the notion that the sales for the two stands are unequal. She thanks you for your honesty, but begs you to complete this test so she might get her sons to stop bickering. You reluctantly agree to complete the test.

Given that you lack data on the population variance for both stands, you decide to conduct a two-sample t-test. A Two-Sample F-Test for Variances needs to be conducted to determine which of the two two-sample t-tests to run.

Figure 40 shows the results of Excel’s F-Test for the equality of variance test:

Figure 40: F-Test for Equality of Variance

F-Test for Equality of Variance

You immediately notice that the F-value is less than 1.00, which is wrong. Caffeine Carl’s data, which is more variable, should have been considered Variable 1. Excel’s Data Analysis Tool’s serious shortcoming, as previously noted, is that it does not automatically place the larger variance in the numerator. It always places the data entered in Variable 1 range in the numerator regardless of the size of this sample’s variance. This is a major problem. The results shown in Figure 40 are wrong because Excel placed Jittery Joe’s variance in the numerator when it should be placed in the denominator. The actual F-value is 2.097, found by:

Equation 20: The F-Test Calculation

Figure 41 shows the correct output for this test using Excel’s F-Test Two-Sample for Variance. Please note: The positions of the two samples have been flipped:

Figure 41: Correct F-Test for Unequal Variance

With a p-value of 0.0016, we reject the null hypothesis that the variances are equal. As a consequence, we must use a two-sample t-test for unequal variances. It should be pointed out that with high variances the test’s statistical power is reduced.

Please Note: One commercial statistics plug-in for Excel, MegaStat, does not fall into this trap. The two-sample analysis of variance F-test is part of the two-sample hypothesis option. Based on the results of this two-sample analysis of variance F-test, MegaStat runs the appropriate two-sample t-test.

Step 2. Select the Level of Significance, α

The level of significance is 0.05. Because this is a two-sample t-test of means for unequal variance, degrees of freedom are reduced. There are 116 degrees of freedom, not 130 (132 - 2). Reducing the degrees of freedom will lower statistical power. Here, in Equation 21, is the formula we use to adjust the degrees of freedom. Equation 22 shows the calculation for this test:

Equation 21: Formula to Adjust Degrees of Freedom

Equation 22: Adjusted Degrees of Freedom

With 116 degrees of freedom for a two-tail test, the critical values are -1.981 and +1.981. See Figure 42.

Figure 42: Critical Values for a Two-Tail Test with 116 DF at a 5% Significance Level

Step 3. State the Null Hypothesis (H0) and Alternate Hypothesis (H1)

This is a two-tail test given the research question: Is there a significant difference in average daily sales for the two espresso stands? Here are the null and alternate hypotheses:

H0: μj = μc; H1: μj ≠ μc.

Please Note: It does not matter which order we place the two stands.

Step 4. Compose the Decision Rule

The decision rule: Reject the null hypothesis if t is less than -1.981 or greater than +1.981. Figure 41 shows the t-distribution for this two-tail test with 116 degrees of freedom:

Figure 43: t-Distributions for a two-tail test with 116 degrees of freedom at a 5% significance level

Step 5. Calculate the Value of the Test Statistic and p-value

Figure 42 shows the output of Excel’s t-Test: Two-Sample Assuming Unequal Variance:

Figure 44: t-Test: Two-Sample Assuming Unequal Variance

The difference in average daily sales is only $1.26. The t-value is 0.047 with a two-sample p-value of 0.9626 or 96.26 percent.

Here is the calculation of the t-value performed by hand:

Equation 23: Test statistic for a t-test for Unequal Variance

The worksheet titled “Example #4 t-test Ind. Means,” like the worksheet titled “Example #3 t-test Ind. Means,” does some important things that the Excel DataAnalysis Plug-in does not:

1. It performs the F-Test Two-Sample for Variances properly by always placing the larger variance in the numerator

2. It runs the appropriate t-test and reports the t-value and p-value

3. It calculates the effect size using Cohen’s d, and reports how to interpret the effect size (Minimal, Small, Medium, and Large)

Figure 45 shows what this analysis looks like. The user enters the cells in red.

Figure 45: t-test using “Example #4 t-test Ind. Means” Worksheet

The value of Cohen’s d effect size is negligible, 0.0021.

How large a sample would we need to achieve 80 percent power using the negligible effect size of 0.0021?

Figure 46: Revised A Priori Power Analysis

We would need sales data for 3,559,579 days! That is 9,752 years. Needless to say, it is not possible to collect that much data. Just think about how long 9,750 years is and where humanity was nearly ten millennia ago. At that time, our ancestors were mostly hunter gathers. Agriculture was in its infancy. No one knows what humanity and our planet will look like ten millennia in the future. Clearly, this test should not have been run.

Step 6. Decide and Report

Based on our tiny effect size of 0.0021, conducting a null hypothesis significance test is not worth the time and effort because even if our sample sizes where so large as to find the $1.26 difference in daily sales statistically significant, no sane person would argue that this minuscule difference has any statistical or practical significance. Your client, however, thinks that the test does have practical significance because if gives her another argument to get her sons to stop quarreling and admit that their stand are performing equally well.

V. Test of the Means of Two Dependent Samples Using t-Values (Paired t-tests)

Dr. I. M. Cagey, the CEO of Ivy League Test Prep is trying to find out whether clients who completed the company’s Scholastic Aptitude Test preparation program increased their SAT scores. This, then, will be a right-tail test. The test to be conducted is a matched pair or dependent sample t-test because it will compare SAT scores at two different times: Scores before completing the Ivy League Test Prep program and scores after completing the program.

The first thing to do is to run an a priori statistical power calculation. Dr. Cagey insists that there will be a sizeable effect. He estimates Cohen’s d effect size is 0.43. This is a small effect, but it is approaching the Medium effect size of 0.50 as shown in Table 10.

Table 10: Cohen’s d Effect Size

Here are the a priori statistical power inputs for G*Power:

1. Test family: t tests.

2. Statistical test: Means: Difference between two dependent samples (matched pairs).

3. Type of power analysis: A priori: Compute required sample size….

4. Tail(s): One.

5. Effect size dz: 0.43.

6. α err prob: 0.05.

7. Power (1-β err prob): 0.8.

The a priori power analysis shows that to achieve 80 percent statistical power the sample must include 35 matched pairs. This, of course, assumes that Dr. Cagey’s estimate of effect size is not too far off the mark.

Figure 47: A Priori Statistical Power

A random sample of 36 students who completed the program is collected. Table 11 shows their SAT scores before and after completing the Ivy League Test Prep program. This is a standard before and after paired t-test. The column marked “d” is the difference in SAT scores.

Table 11: Clients’ SAT Scores Before and After Completing the Ivy League Test Preparation Program

The summary statistics are:

Table 12: Summary Statistics for Clients’ SAT Scores

Step 2. Select the Level of Significance, α

A 5 percent significance level is selected. With a paired sample t-test for this example, degrees of freedom are calculated by the number of paired observations - 1. There are 34 degrees of freedom, found by 35 matched pairs minus 1. The critical value of t is 1.691.

Figure 48: Critical Values for a One-Tail Test with 34 df at a 5% Significance

Step 3. State the Null Hypothesis (H0) and Alternate Hypothesis (H1)

This is a right-tail test given the research question: Have clients who completed their SAT test preparation program increased their SAT scores? Here are the null and alternate hypotheses:

H0: μd ≤ 0; H1: μd > 0.

The symbol μd stands for the “mean of the differences” between the two dependent samples.

Step 4. Compose the Decision Rule

The decision rule: Reject the null hypothesis if t is greater than 1.691. Figure 49 shows a chart for the t-distribution with the rejection region.

Figure 49: t-distribution with a 5% rejection region in the right-tail

Step 5. Calculate the Value of the Test Statistic and p-value

To calculate the test statistic, there are two new formulas we need to introduce:

1. The mean of the differences between the paired samples, d̅ (d-bar). This is the standard formula for the mean.

2. The standard deviation of the differences, sd. This is the standard formula for the sample standard deviation.

1. The Mean of the Difference Between the Paired Samples

Equation 24: Equation for the Mean of the Differences,

2. The Standard Deviations of the Differences

Equation 25: Equation for the Standard Deviation for the Differences, sd

We are now ready to calculate the test statistic. The formula for the test statistic and its calculation is shown in Equation 26:

Equation 26: Test Statistics for Paired t-Test

We can estimate the p-value using the student-t critical values table. See Figure 50. The p-value would be less than 0.01 and greater than 0.005.

Figure 50: p-value for a t-value of 2.610

Excel’s Data Analysis ToolPak has a paired t-test function that will complete this analysis in a few seconds. Here are the steps to complete this analysis:

# 1: Click on the Data Analysis icon:

Figure 51: Excel’s Data Analysis tool on the “ribbon”

# 2: Select t-Test: Paired Two Sample for Mean and click OK

Figure 52: Data Analysis, t-Test: Paired Two Sample for Means

#3: The data entry window will open. Enter the ranges for the After and Before variables.

Check the labels box because cells A1 and B1 have the names of the samples. Enter the variable ranges for the two samples. This is done by dragging the cursor through the cell range. Enter the significance level in the Alpha box and select the output option.

Figure 53: Data input window for a t-Test Paired Two Sample for Means

#4: Read the results.

Figure 54: Output of the Analysis

The t-value is 2.610 with a p-value of 0.0067or 0.67 percent.

Unfortunately, Excel’s t-Test: Paired Two Sample for Means does not calculate Cohen’s d effect size. Figure 55 shows the output of the worksheet labelled Example #5 Dependent Means. After the user enters the significance level, direction of the test, the number of paired observations, along with the sample means, standard deviations, and variances, this worksheet calculates the t-value, p-value, and Cohen’s d effect size, which at 0.44 is a small effect. See Table 13.

Figure 55: Pair t-test performed using Excel with Cohen’s d

Table 13: Cohen’s d Effect Size

Cohen’s d for a paired t-test can be calculated two ways. Both methods yield the sample result:

Equation 27: Equations for Calculating Cohen’s d for a Paired t-test

Step 6. Decide and Report

The t-value for our test is 2.610 and the p-value is 0.0067. Given the fact the p-value is less than the level of significance, we have sufficient data to reject the null hypothesis. Conclusion: SAT scores increased after students completed the Ivy League Test Preparation program. I. M. Cagey is thrilled. He is already preparing ads stating that his program helps clients get statistically significant higher SAT scores. Now you have to break the bad news to Dr. Cagey. SAT scores increased on average by only 19.57 points. While statistically significant and the effect size is not negligible, the findings have little practical significance. An increased SAT score of just under 20 points will most likely not open doors to better colleges for the clients of Ivy League Test Preparation.

VI. Summary

In this chapter you learned how to distinguish between independent and dependent samples. We covered six two-sample Null Hypotheses tests:

1. A two-sample z-test for means.

2. A two sample z-test for proportions.

3. An F-test for equality of variance.

4. A two-sample t-test for means with equal variances.

5. A two-sample t-test for means with unequal variances.

6. A paired sample t-test.

We performed complete analyses that included calculation of effect size, the probability of a Type II error. We also calculated a priori statistical power. We even discussed whether the results have practical significance.

In the next chapter, Chapter 16: ANOVA Test, you will learn how to construct a basic one-way ANOVA table to compare two or more independent sample means simultaneously.

VII. Exercises

Solve the following problems by hand and use Microsoft Excel. Data can be found in Chapter15_Exercises.xlsx.

Exercise 1

Step 1: Test Set-Up

Vladimir P. Kashknave, the founder of GlobeXXX Industries, is interested in comparing the output of his two robotic factories that produce Pseudoexxxsence, the world’s first product that makes older men think they are as virile as they were when they were 19-years-old.

Factory A is using the new method, Factory B is using the old method.

Conduct an a priori statistical power calculation using Statistics Kingdom. Estimate Cohen’s d effect size as 0.45. How large a sample size is required to achieve 80 percent statistical power?

The research question: Does Factory A have higher hourly production rates than Factory B? Here are the sample data:

Table 14: Daily Units Produced

The presumed population standard deviation, σ, is 15.00.

Step 2. Select the Level of Significance, α.

A 5 percent significance level is selected. What is (are) the critical value(s)?

Step 3. State the Null Hypothesis (H0) and Alternate Hypothesis (H1).

Step 4. Compose the Decision Rule.

Step 5. Calculate the value of the Test Statistic, p-value, and effect size. Consider whether there is practical significance.

Step 6. Decide and Report.

Exercise 2

The October 2019 Gallup Poll reported the proportion of Republicans, Independents, and Democrats who favor the legalization of marijuana. Here are the results:

Table 15: Results of the October 2019 Gallup Poll on the Legalization of Marijuana

Source: Gallup https://news.gallup.com/poll/221018/record-high-support-legalizing-marijuana.aspx

Research Question: Are Democrats more likely than Independents to favor the legalization of marijuana? This would be a right-tail test assuming we place Democrats to the left of Independents in the null and alternate hypotheses.

Step 1: Test Set-Up.

Conduct an a priori power calculation using G*Power. Let the calculator determine the size of the Cohen’s h effect.

Step 2. Select the Level of Significance, α.

A 5 percent significance level is selected. What is (are) the critical value(s)?

Step 3. State the Null Hypothesis (H0) and Alternate Hypothesis (H1).

Step 4. Compose the Decision Rule.

Step 5. Calculate the value of the Test Statistic, p-value, and the value of Cohen’s h. Consider whether the findings have practical significance.

Step 6. Decide and Report.

Exercise 3:

Step 1: Test Set-up.

Sue, Grabitt, and Runne is a large international law firm that hires a lot of newly graduated attorneys. To help their new hires settle into their jobs and new city, the firm’s human resources department tracks the commuting times from two popular neighborhoods where their new hires tend to reside, NoBo and SoBo. The research question: Is there a difference in the average commute times for the two neighborhoods?

Using either G*Power or Statistics Kingdom calculate a priori statistical power to determine the necessary sample size needed to achieve 80 percent statistical power. Estimate Cohen’s d effect size at 0.50, which is a medium effect.

Table 16: Commute Times from NoBo and SoBo

Step 2. Select the Level of Significance, α.

A 5 percent significance level is selected. What is (are) the critical value(s)?

Step 3. State the Null Hypothesis (H0) and Alternate Hypothesis (H1).

Step 4. Compose the Decision Rule.

Step 5. Calculate the value of the Test Statistic, p-value, and effect size. Consider whether there is practical significance.

Step 6. Decide and Report.

Exercise 4:

Step 1: Test Set-up.

The next test is a t-test for unequal variance. Here is the problem: The human resources department at Sue, Grabitt, and Runne also tracks the cost of one-bedroom apartments in two popular neighborhoods, NoBo and SoBo. The general perception of long-time residents is that rents are generally lower in SoBo. They hope to determine whether the average rent for a one-bedroom apartment is lower in SoBo than in NoBo.

The results of their survey are shown in Tables 17 and 18:

Table 17: SoBo

Table 18: NoBo

Step 2. Select the Level of Significance, α.

A 5 percent significance level is selected. What is (are) the critical value(s)?

Step 3. State the Null Hypothesis (H0) and Alternate Hypothesis (H1).

Step 4. Compose the Decision Rule.

Step 5. Calculate the value of the Test Statistic, p-value, and estimate statistical power. Consider whether there is practical significance.

Step 6. Decide and Report.

Exercise 5:

Step 1: Test Set-up.

Bayside University of North Kentucky is experimenting with a remedial program for students on academic probation. The university’s administrators randomly selected 49 freshmen on probation and have a Grade Point Average from 1.00, a D- average, to just under a 2.00, or just below a C average. The students were required to attend weekly tutoring sessions and to take special remedial courses the following semester. The college compared students’ GPAs for the first semester after they completed this program to their pre-program GPAs.

Use G*Power to conduct an a priori statistical power analysis. Estimate the effect size at 0.35. Will the sample of 49 students provide sufficient statistical power?

Table 19 shows the data from this test along with the summary statistics:

Table 19: Remedial Education Program Test Results

Step 2. Select the Level of Significance, α.

A 5 percent significance level is selected. What is (are) the critical value(s)?

Step 3. State the Null Hypothesis (H0) and Alternate Hypothesis (H1).

Step 4. Compose the Decision Rule

Step 5. Calculate the value of the Test Statistic, p-value, and estimate statistical power. Consider whether the results have practical significance.

Step 6. Decide and Report. Consider statistical power in your report.

Exercise 2

The October 2019 Gallup Poll reported the proportion of Republicans, Independents, and Democrats who favor the legalization of marijuana. Here are the results:

Table 15: Results of the October 2019 Gallup Poll on the Legalization of Marijuana

Table

Description automatically generated

Source: Gallup https://news.gallup.com/poll/221018/record-high-support-legalizing-marijuana.aspx

Step 1: Test Set-Up.

Conduct an a priori power calculation using G*Power. Let the calculator determine the size of the Cohen’s h effect.

Step 2. Select the Level of Significance, α.

A 5 percent significance level is selected. What is (are) the critical value(s)?

Step 3. State the Null Hypothesis (H0) and Alternate Hypothesis (H1).

Step 4. Compose the Decision Rule.

Step 5. Calculate the value of the Test Statistic, p-value, and the value of Cohen’s h. Consider whether the findings have practical significance.

Step 6. Decide and Report.

Exercise 3:

Step 1: Test Set-up.

Table 16: Commute Times from NoBo and SoBo

Step 2. Select the Level of Significance, α.

A 5 percent significance level is selected. What is (are) the critical value(s)?

Step 3. State the Null Hypothesis (H0) and Alternate Hypothesis (H1).

Step 4. Compose the Decision Rule.

Step 5. Calculate the value of the Test Statistic, p-value, and effect size. Consider whether there is practical significance.

Step 6. Decide and Report.

Exercise 4:

Step 1: Test Set-up.

The results of their survey are shown in Tables 17 and 18:

Step 2. Select the Level of Significance, α.

A 5 percent significance level is selected. What is (are) the critical value(s)?

Step 3. State the Null Hypothesis (H0) and Alternate Hypothesis (H1).

Step 4. Compose the Decision Rule.

Step 5. Calculate the value of the Test Statistic, p-value, and estimate statistical power. Consider whether there is practical significance.

Step 6. Decide and Report.

Exercise 5:

Step 1: Test Set-up.

Use G*Power to conduct an a priori statistical power analysis. Estimate the effect size at 0.35. Will the sample of 49 students provide sufficient statistical power?

Table 19 shows the data from this test along with the summary statistics:

Table 19: Remedial Education Program Test Results

Step 2. Select the Level of Significance, α.

A 5 percent significance level is selected. What is (are) the critical value(s)?

Step 3. State the Null Hypothesis (H0) and Alternate Hypothesis (H1).

Step 4. Compose the Decision Rule

Step 5. Calculate the value of the Test Statistic, p-value, and estimate statistical power. Consider whether the results have practical significance.

Step 6. Decide and Report. Consider statistical power in your report.

Except where otherwise noted, Clear-Sighted Statistics is licensed under a
Creative Commons License. You are free to share derivatives of this work for
non-commercial purposes only. Please attribute this work to Edward Volchok.

Endnotes

1¹ Paul D. Ellis, The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research results. (New York: Cambridge University Press, 2010), p. 77.

2² Paul D. Ellis, The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the Interpretation of Research results. (New York: Cambridge University Press, 2010), p. 54.

3³ Jacob Cohen, Statistical Power for the Behavioral Sciences, Second Edition. (New York: Psychology Press, 1988), p. 14, 40.

4⁴ Jacob Cohen, Statistical Power for the Behavioral Sciences, Second Edition. (New York: Psychology Press, 1988), pp. 181-182.

Show the following:

Adjust appearance:

Notes

Annotate