Chapter14_1SampleNHST | Chapter 14: One-Sample Null Hypothesis Significance Tests

Clear-Sighted Statistics

Chapter 14: One-Sample Null Hypothesis Significance Tests

Figure 1: The Six-Step NHST Cycle

NHST Cycle

I. Introduction

In this chapter, we will use the NHST framework laid out in Chapter 13, Introduction to Null Hypothesis Significance Testing, to conduct the three basic one-sample hypothesis tests: 1) z-test for the mean, μ; 2) z-test for the proportion, π; and 3) t-test for the mean, μ.

After completing this chapter, you will be able to:

• Determine which of the three basic one-sample tests is appropriate.

Calculate Cohen’s d Effect Size for tests for the mean and Cohen’s h Effect Size for tests for the proportion.

• Determine the necessary sample size by conducting an a priori power analysis using Statistics Kingdom’s online power calculator and G*Power.

• Conduct a one-sample z-test for the mean, μ.

• Conduct a one-sample z-test for the proportion, π.

• Conduct a one-sample t-test for the mean, μ.

Figure 2 shows the three one-sample hypothesis tests covered in this chapter. To conduct a test of the mean, μ, we use the normal distribution and zvalues when the population standard deviation, σ, is known and the sample size, n, is equal to or greater than 30. When n is less than 30 or the population standard deviation is unknown, we use the Student-t distribution. When testing the population proportion, π, we use z-values.

Figure 2: The Three One-Sample Hypothesis Tests Using the z-Distribution and t-Distribution

The following files accompany this chapter. You should download these files:

• The Critical Values Tables for normal distributions using z-values:

z-Values_AreaBetweenMean&X.xlsx
z-Values_CriticalValues_z_p-Values.xlsx
z-Values_AreaBetweenMean&X.pdf
z-Values_CriticalValues_z_p-Values.pdf

• The Critical Values Tables for Student-t distributions:

Student-t_CriticalValues.xlsx
Student-t_CriticalValues.pdf

• finding-critical-values-and-p-values.xlsx

• Chapter14_Examples.xlsx (This workbook contains the data and shows the calculations for the examples shown in this chapter)

• Chapter14_Exercises.xlsx (This workbook contains the data for the endofchapter exercises.)

Using these files will help you work through the examples presented in this chapter and solve the end of this chapter exercises.

II. One-Sample Tests for the Population Mean Using z-values

For our first example, we will conduct a one-sample null hypothesis test for the mean using z-values. Here is our problem: Based on past studies, the average man in the United States is assumed to be 69 inches tall (5’9”); the μ, therefore, equals 69 inches. The population standard deviation, σ, is presumed to be 3 inches. Executives at Dubiety Insurance think these parameters underestimate the height of American men. Researchers have been tasked with conducting a study to answer the following question: Is the average American man taller than 69 inches (5’9”)?

Based on the research question, this is a right-tail test because the word “taller” focuses our attention on the right tail. Because the population standard deviation is known, we will conduct a z-test for the mean assuming the sample size is 30 or more. If the sample size is less than 30, a t-test for the mean would be conducted. We can rule out a z-test for the proportion because there are no mentions of a proportion in the discussion of the research problem.

Step 1. Test Set-up

One of the first things the researchers need to do is to conduct an a priori statistical power calculation to determine the appropriate sample size. G*Power lacks a calculation for a onesample z-test for the mean so we will use the sample size calculator found on the Statistics Kingdom website.

A priori calculations are based on effect size among other inputs including: The significance level, the tolerance of a Type II error, and whether the test is left-tail, two tail, or right-tail. The estimated effect size measures the strength of the effect. If that statistical significance is achieved and the effect size is negligible the test is likely over-powered.

The effect size used by the Statistics Kingdom calculator is Cohen’s d. Table 1 shows the Cohen’s d effect size thresholds and how Cohen’s d is interpreted. Anything less than 0.2 is considered a negligible effect. Effect sizes between 0.2 and less than 0.5 are small effects. Effect sizes between 0.5 and less than 0.8 are medium effects. Effect sizes of 0.8 and higher are considered large effects.

Table 1: Interpreting Cohen’s d Effect Size

Based on reading the literature on studies of men’s heights, the researchers at Dubiety Insurance think that the effect size will be a small effect around 0.30. Figure 3 shows the inputs for the Statistics Kingdom a priori power calculation:

Figure 3: Statistics Kingdom Inputs

Here are the inputs:

• Tails: Right (Based on the research question: Is the average American man taller than 69 inches).

• Digits: 4 (The calculator often seems to ignore this instructions).

• Distribution: Normal. There are two options: Normal and T. This will be a normal distribution because the population standard deviation is presumed to be known and the sample size is 30 or more.

• Sample: One sample. There are two options: “One sample” and “Two samples.” One sample is the appropriate selection because there is only one sample.

• Significance level (α): Enter 0.05. The significance level is the researchers’ tolerance of a Type I error. The p-value shows the calculated probability of a Type I error.

• Power: 0.8. This is the desired level of statistical power. This means that the tolerance of a Type II error is 0.2 or 20 percent.

• Effect: Small. There are three options: Small, Medium, and Large.

• Effect type: Standardized effect size. There are two options: Standardized effect size and Unstandardized effect size. Unstandardized effect size is merely sampling error, the difference between the sample statistics and the population parameter.

• Effect Size: 0.30. This is the estimated Cohen’s d effect size. The estimated effect size is based on the researchers’ review of similar studies and judgment.

Once the inputs have been entered click on the Calculate button and the calculator will display the sample size required to achieve 80 percent statistical power. The answer is that sample should have 69 respondents.

Figure 4: A Priori Statistical Power – 80% Power Requires 69 respondents

To address the research question, a random sample 69 American men is collected. As part of this survey, respondents’ heights were measured using a stadiometer. Table 2 shows the results:

Table 2: Survey of the Height of 69 Men Rounded to the Nearest Quarter of an Inch

These are the summary statistics and presumed parameters:

• Sample Mean, X-Bar, = 70.01 inches.

• n = 69 men.

• Presumed Population Mean, μ, = 69.00 inches.

• Presumed population standard deviation, σ, = 3 inches.

Step 2. Select the Level of Significance, α

As previously stated, a 0.05 or 5 percent level of significance is selected.

The critical value for z can be determined using the Area Under the Curve table or Microsoft Excel. Using the Area Under the Curve table, the critical value for this right-tail test with a 5 percent significance level is usually set at 1.65, which places roughly 95 percent of the curve below the critical value. See Table 3. Please note: A few analysts will use 1.64 as the critical value for z for a right-tail test. Selecting 1.65 is a more cautious choice because it makes rejecting the null hypothesis slightly more difficult.

Table 3: Critical Value for z Using the Area Under the Curve Table

We can also determine the critical value of z using Excel. The Excel workbook finding-critical-values-and-p-values.xlsx has a calculator for left-tail, two-tail, and right-tail tests on the Critical Value Calculator worksheet for any significance level. See Table 4 for these calculations along with the Excel formulas for left-tail, two-tail, and right-tail tests.

Table 4: Critical Value for z Using Excel

Please note: The critical value found using Excel, 1.645, is more precise than using either 1.65 or 1.64. We will, therefore, use 1.645.

Step 3. State the null hypothesis (H0) and alternate hypothesis (H1)

As discussed in Chapter 13, the null hypothesis means no difference, no effect, or no statistical significance. Any difference between the sample statistic and presumed population parameter can be attributed to sampling error. The null hypothesis always refers to the population parameter, never the sample statistic; therefore, it must show the appropriate symbol for the population parameter. In this case, the population mean, μ, not the sample mean, X-Bar. In addition, the null hypothesis contains one of three equal signs: ≥, =, or ≤. In our example, the null hypothesis states that the difference between the population mean height of 69 inches is not statistically significant compared to the sample mean, X-Bar, 70.01 inches. The alternate hypothesis is the opposite of the null hypothesis. It is, in essence, the research question. Its mathematical symbols are the opposite of the one in the null hypothesis; that is, <, ≠, or >. The alternate hypothesis means that any difference between the sample statistic and population parameter is too large to be explained by random sampling error.

The mathematical symbol in the alternate hypothesis points to one of the tails or both tails that make up the rejection region or regions, or the area of the curve that will require us to reject the null hypothesis if the test statistic is in that region.

To repeat, the difference between the parameter and statistic—sampling error—is sometimes called the unadjusted or unstandardized effect size.

As discussed in Chapter 13, tests using z-values and t-values are directional. The rejection region could be on the left-tail, both tails, or right-tail. With left-tail tests, the entire significance level is placed on the left-tail. The same holds for right-tail tests. With two-tail tests, the significance level, or α, is divided into two equal parts, α/2, with half the significance level placed on each of the two tails. The critical values for a left-tail test are always in negative units z or t. The critical value for right-tail tests is always positive. Two-tail tests have two critical values, one negative on the left tail and one positive on the right tail. The critical values for two-tail tests are always more extreme than corresponding one-tail tests. It is, therefore, a little harder to reject the null hypothesis when using a twotail test. This also increases the probability of committing a Type II error; that is failing to find a statistically significant event when there is one. In other words, two-tail tests have less statistical power than one-tail tests.

To determine whether a test is left-tail, two-tail, or right tail, we examine the research question. Figure 5 shows the difference in the descriptive phrases and adjectives used to distinguish among left-tail, two-tail, and right-tail tests. It also shows the difference in the mathematical symbols used to distinguish the direction of the test, and z or t distribution curves with the shaded rejection regions for these tests. For left-tail tests, the research question will have phrases or adjectives like “less than,” “faster,” “shorter,” “smaller,” “decreased,” and “below.” It should be pointed out that “faster” is a left-tail test when the measurement is time because something that is faster occurs in less time. For two-tail tests, the research question will have words like “not equal,” “different,” “not the same,” or “has changed.” Right-tail tests will have “greater than,” “slower,” “longer,” “bigger,” “increased” and “above.” “Faster” could be a right-tail test if the measurement is rate of speed: miles or kilometers per hour or light years.

Figure 5: The Wording for Left-Tail, Two-Tail, and Right Tail Tests

The differences among a left-tailed, two-tailed, and right-tailed test.

The research question for our test is: Are American men taller on average than the average height of 69 inches? The word “taller” indicates that our test is a right-tail test, and the alternate hypothesis must have a greater than sign, >, and the critical value will be a positive number. Because we are conducting a right-tail test, the null hypothesis takes a less than or equal sign, ≤.

Here are the null and alternate hypotheses for our tests:

H0: μ ≤ 69 inches; H1: μ > 69 inches

Step 4. Compose the Decision Rule

Because we are conducting a right-tail test at a 5 percent significance level, the rejection region will cover the upper 5 percent of the normal curve. The rejection region starts at a zvalue of 1.645 as shown in Figure 6.

Figure 6: Normal Curve with a 5% Rejection Region in the Right-Tail

Normal Curve for a 5% right-tailed test. The Critical Value equals 1.65.

Using Microsoft Excel, the critical value for z is 1.644853626951, which we round off to 1.645. To repeat: A few researchers use 1.64 as the critical value for a one-tail test at a 0.05 significance level, which makes it slightly easier to reject the null hypothesis. When the Area Under the Curve table is used, a 1.65 z-value is more commonly used. Statistical software uses 1.645. Choosing either 1.65 or 1.645 makes the decision to reject the null hypothesis a bit more cautious than using 1.64, but it also slightly reduces statistical power, or the probability of finding a statistically significant event.

The decision rule: Reject the null hypothesis if z is >1.645.

We could also write the decision rule in terms of the p-value: Reject the null hypothesis if the p-value is ≤ 0.05.

Step 5. Calculate the Value of the Test Statistic, p-Value, Effect Size, and Post Hoc Statistical Power

Equation 1 shows the test statistic for a one-sample z-test for the mean. We have seen this test statistic in Chapter 10. It is a complex fraction with sampling error in the numerator and the standard error for the mean, SEM or σX-Bar, in the denominator. Here is the formula:

Equation 1: Test Statistic for a One-Sample z-test for the Mean

Equation 2 shows the calculation for this problem:

Equation 2: z = 2.789

We can calculate the test statistic, effect size, and p-value very quickly using Microsoft Excel. The worksheet labeled Example 1 z-test Mean on the file named Chapter 14_Examples.xlsx shows these calculations along with the calculations for effect size and post hoc statistical power. Table 5 shows these calculations along with the Excel formulas:

Table 5: Excel Calculations

Graphical user interface, table

Description automatically generated

Table 5 includes a lot of critically important calculations and information:

• The sample mean, 70.01, found using the AVERAGE function.

• The population mean, μ, which was stated with the problem.

• The population standard deviation, σ, 3.00, which was stated with the problem.

• SEM or standard error of the mean: 0.361.

• The z-value or test statistic: 2.789.

• The p-value: 0.0026 or 0.26 percent.

• Cohen’s d effect size: 0.3357.

• The interpretation of this effect size: 0.3357 is a small effect.

Notes on p-value

The level of significance is our tolerance of a Type I error, which occurs when we reject the null hypothesis when the difference between the sample statistic and population parameter is merely the result of sampling error. The p-value is the probability of getting a test statistic as extreme or more extreme than the one we found. It is also the calculated probability of a Type I error. We reject the null hypothesis whenever the p-value is equal to or less than the significance level.

We can calculate the p-value for a normal distribution two ways:

1. Using Microsoft Excel.

2. Using the Critical Values Tables for z.

We can find the p-value for z-tests using the NORMSDIST function:

Equation 3: Excel Function for Finding the p-value for a z-Test

One-Tail Test: =0.5-NORMSDIST(z-value)-0.5)

Two-Tail Test: =2(0.5-NORMSDIST(z-value)-0.5))

Table 6 shows how the p-value is calculated using the Area Under the Curve table and a variant of this table, which shows the area in the tail. The p-value when z equals 2.789 is approximately 0.0026 or 0.26 percent.

Table 6: Two Abbreviated Critical Values Tables for z

The first table shows the area between the mean and z (2.79, which is 0.4974, or 49.74 percent of the curve above the mean. The pvalue is 0.0026 or 0.26 percent, found by 0.5000 – 0.4974. The second table shows the p-value for a one-tail test, which is the area that includes the z-value and the values that are more extreme. This table also shows that the p-value is 0.0026. Remember: For a two-tail test, we double the pvalue found using this method.

Notes on Cohen’s d Effect Size

Effect size does not measure statistical significance. Statistical significance indicates the probability that the test statistic is the result of random sampling error. Effect size, on the other hand, measures the magnitude or strength of the effect under consideration.

Unstandardized effect size, as previously mentioned, is the difference between the sample statistic and the population parameter; or quite simply, sample error. Effect size is standardized by dividing the absolute value of the sample error by the population standard deviation or sample standard deviation when the population standard deviation is not available. We use a standardized measure of effect size so that we can compare the size of the effects for data measured with difference scales. Cohen’s d is the most common standardized measure of effect size for a onesample z or t test. Equation 4 shows the formula for Cohen’s d and the calculation for this example:

Equation 4: Formula for Cohen’s d

Table 7 shows how we interpret effect size. The effect size for a 1.01-inch difference in height is 0.3357, which is a small effect. This result was calculated using Excel, so it is more precise that using a handheld calculator because it does not use rounded numbers. As previously stated, in the social sciences and business most effect sizes tend to be small. The calculated effect size is slightly larger than the 0.30 effect size used to calculate a priori statistical power. As a result, this test will have slightly more statistical power than 80.154 percent shown in the a priori power calculation.

Table 7: Interpreting Cohen’s Effect Size

Notes on Statistical Power

In Chapter 13, we saw that Type II errors and statistical power are closely related. In fact, the probability of a Type II error is the complement of statistical power. The relationship between statistical power and the probability of a Type II Error is found using the following formula:

Equation 5: The Relationship of Statistical Power to Type II Errors

Statistical Power = 1 – P(Type II Error)

Jacob Cohen, the psychologist, and statistician for whom the Cohen’s d effect size is named, wrote the following statement in his groundbreaking book, Statistical Power Analysis for the Behavioral Sciences:

The power of a statistical test is the probability that it will yield statistically significant results. Since statistical significance is so earnestly sought and devoutly wished for by behavioral scientists, one would think that the a priori probability of its accomplishment would be routinely determined as well as understood. Quite surprisingly, this is not the case.1

Professor Cohen and his colleagues also point out that low statistical power may cause an investigator to prematurely abandon a promising inquiry.2

Many peer-reviewed journals immediately reject articles that do not report effect size statistical power and do not present them to peer reviewers. The American Psychological Association’s style manual directs authors to report Effect Size or statistical power or both. Unfortunately, most introductory statistics textbooks rarely discuss statistical power and related concepts in any detail even though these concepts are not especially difficult.

Step 6. Decide and Report

With a test statistic as extreme as 2.789 and a p-value as low as 0.26 percent, the null hypothesis should be rejected. Figure 7 shows a chart with the p-value in the barely visible black zone along with the rejection region in red.

Figure 7: A z-value of 2.789 is in the rejection region at a 5% significance level; the p-value is 0.26%.

Based on our high z-value of 2.789 and the small p-value of 0.26 percent, the null hypothesis is rejected. Remember: A p-value is a measure of how surprised we are to obtain a test statistic as extreme or more extreme than the one we found. When the p-value is greater than the level of significance, we fail to reject the null hypothesis. But, when the p-value is less than or equal to the significance level, we reject the null hypothesis. In addition, a low p-value does not mean the alternate hypothesis is true or that the statistically significant difference between the statistic and parameter has any practical significance. In our example, there is only a 0.26 percent probability that the difference between 70.01 inches and 69 inches is due to sampling error. Conclusion: We have statistical significance. But do we have practical significance? It is questionable whether the difference between 69 inches and 70.01 inches, has much practical significance for the men in the survey. It may have practical significance for the sponsor of this study. But we can conclude that the size of the effect found, 0.3357, is not the result of a sampling fluke.

II. One-Sample Tests for the Proportion Using z-values

The Bureau of Labor Statistics’ surveys shows that union membership has fallen since 1950. In 2018, in their Janus v. AFSCME decision, the Supreme Court struck down the right of public sector unions to charge non-union members an “agency fee” to cover the costs of negotiating and administering labor contracts. Given these facts, many people believe labor unions are a relic of the twentieth century and will not survive much longer.

Step 1. Test Set-up

Imagine that you work as a researcher for a pro-union group. Your organization wants to determine whether the proportion of the American public that approves of labor unions has increased. The presumed population proportion, π, is 61 percent, based on the previous Gallup poll conducted in 2017.3 On August 28, 2019, Gallup published the results of their poll of 1,522 Americans, 974 said they approved of labor unions.4 The research question: Has the proportion of Americans who approve of labor unions increased?

The sample proportion is calculated using the formula shown in Equation 6:

Equation 6: Formula to find the sample proportion

Where: p is the sample proportion

X is the number of people who gave a particular answer

n is the number of people who responded

Here is a summary of the data:

• Sample proportion: p = 0.64 or 64 percent.

• n = 1,522.

• Presumed population proportion: π = 0.61 or 61 percent, based on the previous poll.

A Priori Statistical Power:

We have been arguing that an a priori statistical power calculation should be performed before data are collected. With this example, however, we have data; good data from a highly respected public opinion pollster before the a priori statistical power is calculated. So, naturally, the question arises: Do we need to calculate a priori statistical power? The answer is yes. This is because we need to know whether given the effect size, we will have sufficient statistical power.

Unfortunately, G*Power lacks a tool to calculate a priori statistical power for a one-sample test of proportion. Fortunately, Statistics Kingdom has an a priori statistical power calculator for this test. The tool will even calculate the Cohen’s h effect size used in this calculation. Here is the link to this tool: https://www.statskingdom.com/proportion-sample-size-calculator.html.

Here are the seven inputs we must enter:

1. Tails: There are three options: Left-Tail, Two-Tails, or Right-Tail. Select Right-Tail. Based on the research question select a right-tail test.

2. Sample: There are two options: “One sample” and “Two samples.” Select “One sample.”

3. Significance level (α): Enter the selected significance level, 0.05.

4. Statistical Power: Enter 0.8.

5. Effect: There are three options: a. Small, b. Medium, and c. Large. Here is how the Cohen’s h effect size is interpreted.

Table 8: Cohen’s h Interpretation

6. h effect size: Let this power calculator calculate the effect size and enter the appropriate effect size. Click on the calculate h button to calculate Cohen’s h.

Figure 8: Cohen’s h Calculator

Under P1, enter the presumed population proportion, 0.61. Under P2, enter the sample proportion, 0.64. Click on Calculate h. The calculator returns: 0.06197963871153367, which is a tiny, if not negligible, effect. Under the Effect input select the “Small” option. Cohen’s h is often smaller than 0.20. This does not necessarily mean that the effect is negligible and if statistical significance is achieved the test is over-powered.

7. Rounding: Set the option to 4.

There are the inputs:

Figure 9: Statistics Kingdom Inputs

To achieve 80 percent power, a sample size of 1,610 is required.

Figure 10: A Priori Statistical Power

Given the fact that the sample size is only 1,522, achieving 80 percent statistical power is unlikely and we should be concerned about whether this test will have sufficient power.

Step 2. Select the Level of Significance, α

A decision has been made to use a 5 percent significance level. Based on the significance level and the fact that this is a right-tail z-test, the critical value of z is 1.645. See Table 9.

Table 9: Critical Value for z Using Excel

Step 3. State the null hypothesis (H0) and alternate hypothesis (H1)

The research question for our test is: Has the proportion of Americans who approve of labor unions increased? The word “increased” indicates that our test is a right-tailed test, and the alternate hypothesis must have a greater than sign, >. The null hypothesis, therefore, must have a less than or equal sign, ≤.

Please note: The null and alternate hypotheses always refer to the population proportion, 61 percent in this case, not the sample proportion, 64 percent. The null and alternate hypotheses, therefore, must have the population proportion symbol, π, not the sample proportion symbol, p.

Here are the null and alternate hypotheses:

H0: π ≤ 61%; H1: π > 61%

Step 4. Compose the Decision Rule

Because this is a right-tail test, the 5 percent level of significance goes on the right-tail. Figure 11 shows the normal curve for a significance level of 0.05 and with 5 percent rejection region, shown in red, in the right tail:

Figure 11: Critical values for a two-tail test at a 5% significance level, 1.645

The decision rule: Reject the null hypothesis if z is greater than 1.645.

Step 5. Calculate the Value of the Test Statistic, p-value, and Effect Size

Our next step is to calculate the test statistic, p-value, and effect size. Like the test statistic for the mean, the test statistic formula for the proportion follows a similar structure. It is a complex fraction with sampling error in the numerator and the standard error for the proportion, SEP or σp, in the denominator. Here is the formula along with the calculations for our problem:

Equation 7: Test statistic for a proportion

Where: p = the sample proportion

X = number of people who gave a particular answer

n = number of people who responded

π = population proportion

This calculation can be performed using Excel. See the Excel worksheet “Example 2 z-test for π” in the workbook Chapter14_Examples.xlsx. Table 10 shows the input and the calculations.

The following inputs are required:

1. B3, the significance level, σ, 0.05.

2. B4, Test Direction, enter Right-Tail.

3. B5, Number of Successes, enter 974, which is given in the problem.

4. B6, Sample Size, n, enter 1,522, which is given in the problem.

5. B13, the population proportion, π. Enter 0.6100, which is given in the problem.

This worksheet will calculate:

1. The null and alternate hypotheses in Cells B8:B9.
H0: π ≤ 0.6100
H1: π ≥ 0.6100

2. The Critical Value for in Cell B10, 1.645.

3. The decision rule in Cell B11. Reject the null hypothesis if z is greater than 1.645.

4. The sample proportion, p, in Cell B12.

5. Sampling Error in Cell B14: 0.030.

6. The sample size in Cell B15, 1,522.

7. The Standard Error of the Proportion (SEP) in Cell B16: 0.013.

8. The test statistic or z-value in Cell B17, 2.438.

9. The p-value in Cell B17, 0.0074 or 0.74%.

8. The decision regarding the null hypothesis in Cell B18: Reject the H0.

9. Cohen’s h Effect Size and interpretation in Cells B20:B21, 0.0620, which is a negligible effect.

Table 10: Excel Calculations and Formulas for Example 2

Table

Description automatically generated

An important question arises: Is this test under-power because statistical power is less than 80 percent? The answer is no. Why? This test provides sufficient evidence to reject the null hypothesis. But, what about this somewhat under-powered test inflating the p-value? That is a relatively minor possibility. But the researchers should review additional polls to see if the results of additional polls are consistent.

Step 6. Decide and Report

The z-value for our test is 2.438. Because 2.438 is in the rejection region and the 0.74 percent p-value is less than the significance level of 5 percent, we have sufficient evidence to reject the null hypothesis. We, therefore, have statistical significance.

Figure 12 shows a clear graphic representation of our test results. The red area in the right-tail represents our test statistic of 2.438 with its 0.74 percent p-value.

Figure 12: Two-tail test at a 5% significance level, z = 2.438 and a p-value of 0.0074

Conclusion: The null hypothesis that π is equal to or less than 61 percent is rejected. The difference between the sample proportion of 64 percent and the presumed population proportion of 61 percent is statistically significant at a 5 percent and 1 percent significance level. The difference between this sample statistic and the presumed population parameter is beyond what we would expect from sampling error. This survey indicates that the approval of unions has increased.

Does this three-percentage point gain in the approval of unions have practical significance at a time when the U.S. Bureau of Labor Statistics reports that union membership has hit record lows?5 The issue of practical significance is often a question of judgment. It is likely that union leaders and some policy makers will view this result as having practical significance.

Given that the sample size is below the level needed to achieve 80 percent statistical power, the results of this test will need to be confirmed by testing additional poll results.

III. One-Sample Tests for the Mean Using the t-Distribution

We use t-tests for the mean when at least one of two conditions are met:

1. The sample size, n, is less than 30, or

2. The population standard deviation, σ, is unknown.

Given the fact that the population standard deviation is usually unknown, t-tests for the mean are used far more often than z-tests for the mean. The critical values for t-tests are always more extreme than those for z-tests. One implication of this is that t-tests have slightly less statistical power than z-tests. The differences in the critical values for t and z, however, diminish as the sample size increases. With very large sample sizes, the difference in the critical values of z and t is little more than rounding error.

Table 11 shows the differences between z-values and t-values for two-tail tests at 10 percent, 5 percent, and 1 percent levels of significance when degrees of freedom are set at 2,000 for a very large sample: 2,001 observations. Degrees of freedom, df, are found by the sample size minus the number of independent samples; 2,000, found by, 2,001 minus 1. With 2,000 degrees of freedom, the difference in the critical values for t and z is negligible.

Table 11: The difference between z-values and t-values for a two-tail test

Step 1. Test Set-up

Our example: You have been hired by Leben Instruments, a medical device company, to conduct a test regarding their 24/7 phone support. People use this company’s devices to manage their chronic diseases. Sometimes they need product support. The company promises its customers that they will not be kept waiting longer than four minutes when they call for product support. Here is the question you seek to answer: Do customers wait longer than four minutes? Given this research question, we will conduct a righttail test.

As always, the first step when setting up a null hypothesis significance test is to estimate the required sample size needed to achieve a desired level of statistical power. This, of course, is an a priori statistical power calculation. We seek 80 percent power. As we have discussed, a priori statistical power is calculated based on the estimated effect size, the researchers’ tolerance of Type I and Type II errors, and whether the test is left-tail, two-tail, or right-tail. The effect size measure used for this test is Cohen’s d, which is the same effect size used for the one-sample z-test for the mean. Here again, in Table 12, is how Cohen’s d is interpreted.

Table 12: Interpreting Cohen’s d Effect Size

Based on a review of similar studies you expect a small effect between 0.30 and 0.35. You, therefore, decide to estimate the effect size at 0.325.

You can calculate a priori statistical power for a one-sample t-test for the mean using G*Power or Statistical Kingdom’s sample size calculator. We will use G*Power first and then Statistical Kingdom.

Here are the inputs for G*Power:

• Test Family: t tests

• Statistical Test: Means: Difference from constant (one sample case)

• Type of power analysis: A priori: compute required sample size – given α, power, and effect size.

• Tails: One.

• Effect size d: 0.325.

• α err prob: Enter the significance level, which is 0.05.

• Power (1- β err prob): Enter 0.8.

As shown in Figure 13, G*Power reports that the required sample size is 60.

Figure 13: G*Power’s A Priori Sample Size is 60

Statistics Kingdom’s a priori statistical power calculator has nine inputs:

• Tails: Right (Based on the previous discussion of the research question. Are customers waiting long than four minutes for telephone support?)

• Digits: 6 (Size digits is the default. The calculator often seems to ignore this instructions).

• Distribution: T. There are two options: Normal and T. This will be a tdistribution because the population standard deviation is unknown.

• Sample: One sample. There are two options: One sample and Two samples. One sample is the appropriate selection because there is only one sample.

• Significance level (α): Enter 0.05. The significance level is the researchers’ tolerance of a Type I error. The p-value shows the calculated probability of a Type I error.

• Power: 0.8. This is the desired level of statistical power.

• Effect: There are three options: Small, Medium, and Large. Select Small.

• Effect type: Standardized effect size. There are two options: Standardized effect size and Unstandardized effect size. Unstandardized effect size is merely sampling error or the difference between the sample statistics and the population parameter.

• Effect Size: 0.325. This is the estimated Cohen’s d effect size.

Figure 14: Statistics Kingdom’s A Priori Statistical Power Inputs

Statistics Kingdom’s sample size calculator also reports a sample size of 60 is required:

Figure 15: Statistics Kingdom reports that the sample size should be 60

You selected a random sample of 60 phone calls to Technical Support and recorded the waiting times. Table 13 shows the results from your survey.

Table 13: Waiting Time in Minutes

The sample mean is 5.00 minutes and the sample standard deviation is 2.257 minutes. Even though the sample size is greater than 30, we must conduct a t-test because the population standard deviation is unknown.

Step 2. Select the Level of Significance, α

You decided to use a 5 percent significance level. The critical value of t for a right-tail test with 59 degrees of freedom is 1.671. The critical value for t can be found using the Studentt table or Excel. Table 14 shows the critical value of t using the Student-t table. Table 15 shows the calculation in Excel. The syntax for the Excel formula is shown in Equation 8.

Table 14: Critical value for a one-tail t-test with a 5% significance level and 59 degrees of freedom

Table 15: The Critical Value of t found using Excel

Equation 8: Excel Formula for finding the Critical Value of T, Right-Tail Test

=ABS(T.INV(alpha,df)

Step 3. State the null hypothesis (H0) and alternate hypothesis (H1)

Based on the research question, here are the null and alternate hypotheses:

H0: μ ≤ 4 minutes; H1: μ > 4 minutes

Step 4. Compose the Decision Rule

Decision rule: Reject the null hypothesis if t is greater than 1.671. Figure 16 shows a chart of the t-distribution with 59 degrees of freedom and a 5 percent rejection region on the right tail.

Figure 16: t-Distribution with 59 degrees of freedom and an α of 5%

Step 5. Calculate the Value of the Test Statistic and p-value

The formula for the test statistic for a t-test for the mean is just like the one used for the z-test for the mean: Sampling error is in the numerator and the standard error of the mean is in the denominator. But there are two important differences:

1. The population standard deviation, σ, is replaced with the sample standard deviation, s.

2. We used the critical values for t, not z, to interpret the result.

Here is the formula and calculation of the test statistic for this problem, t equals 3.432.

Equation 9: Test Statistic for t-values

Table 15 shows the calculations for the value of the test statistic and the p-value done in Microsoft Excel. Excel was also used to calculate the sample mean and sample standard deviation.

Table 15: One-sample t-test calculation done using Microsoft Excel

Table

Description automatically generated

Excel gives a precise calculation of the p-value, 0.0006. See Cell C13. We report a tiny p-value like this as < 0.001.

We can also get a rough estimate of the pvalue using the critical value table for Student-t distributions. Here is how to make this estimate:

1. Find the row for the appropriate degrees of freedom.

2. If the calculated test statistic is negative, drop the negative sign.

3. Look at the column headers that match the number of tails in the test.

4. Match the calculated t-value to those in the degrees of freedom row.

5. The p-value will be between the values in the column header.

Here is the estimate of the p-value for our example with a t-value of 3.432 and 59 degrees of freedom. Using the Student-t table, we estimate the p-value as less than 0.5 percent and greater than 0.05 percent. See Table 16.

Table 16: Estimating the p-value using the Student-t Critical Values Table

Effect Size: We adapt the formula for Cohen’s d effect size by substituting the sample standard deviation, s, for the population standard deviation, σ. Equation 10 shows the formula and the effect size for this problem, 0.4431, which is a small effect size. See Table 17 for how Cohen’s d effect sizes are interpreted.

Equation 10: Cohen’s d Effect Size

Table 17: Cohen’s d Effect Size

Because the effect size is larger than anticipated, this test’s actual statistical power will be a bit larger than 80 percent.

Step 6. Decide and Report

The t-value for our test is 3.432 and the p-value is <0.001. We have sufficient evidence to reject the null hypothesis and conclude that the difference between the 5-minute waiting time for our sample and the presumed population mean of 4 minutes is statistically significant. This difference is greater than what we would expect from sampling error. Conclusion: Callers are waiting longer than four minutes to speak to Technical Support. Figure 17 charts this finding with the test statistic of 3.432 clearly in the rejection region.

Figure 17: Graphic representation of the one-sample t-test with 59 degrees of freedom

Do these results have practical significance? Like most effects found in the social sciences, the extra minute of weight time is a small effect (0.4196). The extra minute of wait time has practical significance, given the fact that the call center services people who may be in distress. Management, therefore, should be concerned that callers are waiting longer for service.

V. Summary

We reviewed the six-step NHST process for the three basic one-sample tests:

1. z-test for the population mean, μ, when the population standard deviation, σ, is known and the sample size is 30 or more.

2. z-test for the population proportion, π.

3. t-test for the population mean, μ, when the population standard deviation, σ, is unknown or the sample size is 29 or less.

We have seen how to determine the “direction” of the tests; which is to say, how to decide whether the test is a left-tail, two-tail, or right tail test. We have written the null and alternate hypotheses as well as decision rules. We have calculated the value of the test statistics for the three one-sample test statistics introduced in this chapter. In addition, we showed how to find the p-value using Microsoft Excel and the Critical Values Tables for z and t distributions. We demonstrated how to decide whether to reject the null hypothesis and how to explain what that decision means. We have calculated a priori statistical power of the Statistical Kingdom online power calculator and G*Power. We discussed how Type II errors and statistical power are related to sample size, n, the level of significance, α, and the variability of the data. We have also reviewed the issue of practical significance.

VI. Exercises

Solve the following problems using the six-step NHST procedure:

Step 1. Test set-up. Include an a priori power calculation.

Step 2. Select the level of significance, α. Find the critical value or values.

Step 3. State the null hypothesis (H0) and alternate hypothesis (H1).

Step 4. Compose the decision rule.

Step 5. Calculate the test statistic, p-value, and effect size.

Step 6: Decide and report. Be certain to discuss the issue of practical significance.

The data for these exercises are available in the Chapter14_Exercises.xlsx worksheet. Use this worksheet to answer these questions.

Assume that these tests have been properly conducted. The researchers made a good effort at reducing the likelihood of systematic errors and they did not engage in any nefarious shenanigans like “phacking”6 or “HARKing”7. In other words, the researchers were neither fools nor knaves. Phacking (also called “inflation bias,” “selective reporting,” “data fishing,” “data butchery,” or “data dredging”) occurs when researchers conduct a huge number of statistical analyses and only report statistically significant results. HARKing stands for “Hypotheses After the Results are Known.” HARKing occurs when researchers modify their hypotheses once the results are found. A researcher may engage in HARKing because he or she wants statistically significant results because of publication bias. Publication bias is the reluctance of journals to publish articles that lack statistical significance. HARKing is considered a disreputable practice because it raises the probability of Type I errors and makes it more difficult for other researchers to reproduce the results.

Exercise 1:

Step 1. Test Set-up

The current Tesla Model S can travel an average of 300 miles on a single charge: μ equals 300 miles, with a presumed population standard deviation, σ, of 40 miles. You are a researcher for the team developing a prototype for the next Tesla Model S. Your objective is to produce a vehicle that can be driven significantly more than 300 miles on a single charge.

What test statistic should you use? What are the reasons for this decision?

Based on the research question, is this a left-tail, two-tail, or right-tail test? Justify your answer?

Using the Statistics Kingdom sample size calculator, conduct an a priori power analysis. Estimate Cohens d effect size as 0.25. What size sample do you need to achieve 80 percent power. Here is the link: https://www.statskingdom.com/sample_size_t_z.html.

You have conducted a test of your most promising prototype. Here are the results:

Table 18: Miles on a Single Charge for Prototype

Step 2. Select the Level of Significance, α

A 5 percent significance level has been selected. What is (are) the critical value(s)?

Step 3. State the null hypothesis (H0) and alternate hypothesis (H1)

Step 4. Compose the Decision Rule

Step 5. Calculate the Value of the Test Statistic, p-value, and Effect Size

Using Microsoft Excel, count the number of variables, find the sample mean and complete the appropriate significance test with the p-value, and Cohen’s d effect size.

Step 6: Decide and Report

Do you have statistical significance? What information supports your position? How does your decision relate to the research question? Be certain to consider whether the test results have practical significance.

Exercise 2

Step 1. Test Set-up

Tony V. owns a house painting business in Seattle. He estimates that his crew takes threeandahalf days to paint the interior of a typical house. He has no estimate of the population standard deviation, σ. He is testing a new paint that is supposed to dry faster than the paint he has been using for over thirty years. He estimates that the average time to paint a house is 3.5 days. The new paint is just as good as the old paint and costs the same. The question Tony wants to answer is: Will the new paint allow him to complete a house painting job faster than the paint he has been using? Speed will be measured by the number of days needed to complete a paint job.

What test statistic should you use? Why?

Based on the research question, is this a left-tail, two-tail, or right-tail test? Justify your answer?

Using the Statistics Kingdom sample size calculator or G*Power, conduct an a priori power analysis. Estimate Cohens d effect size as 0.50. What size sample do you need to achieve 80 percent power. Here is the link to the Statistics Kingdom power calculator: https://www.statskingdom.com/proportion-sample-size-calculator.html/.

A survey was taken based on the results of the a priori power calculation. Here are the results:

Table 19: Days to Paint a House

Step 2. Select the Level of Significance, α

A 5 percent significance level has been selected. What is (are) the critical value(s)?

Step 3. State the null hypothesis (H0) and alternate hypothesis (H1)

Step 4. Compose the Decision Rule

Step 5. Calculate the Value of the Test Statistic, p-value, and Effect Size

Using Microsoft Excel, count the number of variables, find the sample mean, sample standard deviation, and complete the appropriate significance test with the p-value, and Cohen’s d effect size.

Step 6: Decide and Report

Exercise 3

Step 1. Test Set-up

The Dean of Students at Nunya Business College claims that the college’s historical records show that by graduation, 95 percent of graduating seniors will have accepted offers for fulltime salaried jobs in their area of study. The leaders of student government think this is an exaggeration. They sampled the college’s records for the last ten years and found that of the 1,028 students in the random sample only 970 accepted a job before graduation. You have been asked to determine whether the student survey provides sufficient evidence that fewer than 95 percent of graduating seniors have accepted jobs prior to graduating.

Conduct a null hypothesis significance test using the appropriate test statistic. Using the Statistics Kingdom sample size calculator, calculate a priori statistical power. Let’s Statistics Kingdom calculate Cohen’s h effect size. Is this test under-powered?

Step 2. Select the Level of Significance, α

A 5 percent significance level has been selected. What is (are) the critical value(s)?

Step 3. State the null hypothesis (H0) and alternate hypothesis (H1)

Step 4. Compose the Decision Rule

Step 5. Calculate the Value of the Test Statistic, p-value, and Effect Size

Using Microsoft Excel, find the sample proportion, and complete the appropriate significance test with the p-value, and Cohen’s h effect size.

Step 6: Decide and Report

Except where otherwise noted, Clear-Sighted Statistics is licensed under a
Creative Commons License. You are free to share derivatives of this work for
non-commercial purposes only. Please attribute this work to Edward Volchok.

Reference

1 Jacob Cohen, Statistical Power Analysis for the Behavioral Sciences, Second Edition. (New York: Psychology Press, 1988), p. 1.

22 Joan Welkowitz, Robert B. Ewen, and Jacob Cohen, Introductory Statistics for the Behavioral Sciences, Fourth Edition, (New York: Harcourt, Brace, Jovanovich, 1991), p. 25.

3 Art Swift, “Labor Union Approval Best Since 2003, at 61%,” Gallup, August 30, 2017. https://news.gallup.com/poll/265916/labor-day-turns-125-union-approval-near-year-high.aspx.

4 Jeffery M. Jones, “As Labor Day Turns 125, Union Approval Near 50-Year High,” Gallup, August 28, 2019 https://news.gallup.com/poll/265916/labor-day-turns-125-union-approval-near-year-high.aspx.

5 “Union Members Summary,” U.S. Bureau of Labor Statistics, January 22, 2020. https://www.bls.gov/news.release/union2.nr0.htm.

6 Megan L. Head, Luke Holman, Rob Lanfear, Andrew T. Kahn, and Michael D. Jennions, “The Extent and Consequences of P-Hacking in Science,” PLOS Biology, March 13, 2015. https://doi.org/10.1371/journal.pbio.1002106.

7 Norman L. Kerr, “HARKing: Hypothesizing After the Results are Known, Personality and Social Psychology Review, Vol. 2, No. 3 1998, pp. 196-217. https://doi.org/10.1207/s15327957pspr0203_4.

Show the following:

Adjust appearance:

Notes

Annotate