Chapter9_NormalProbDistribution | Chapter 9: Normal Probability Distributions

Clear-Sighted Statistics

Chapter 9: Normal Probability Distributions

I. Introduction

In Chapter 5, we reviewed the Empirical or Normal Rule, which is based on the normal probability distribution or “normal curve” for short. You will recall that normal curves are symmetrical and continuous. These distributions are centered around the mean, median, and mode. See Figure 1.

Figure 1: Normal Probability Distribution

Symmetrical Distributions

Two parameters define the normal probability distribution: The population mean, μ, and population standard deviation, σ. Because of the differences of these parameters, the tails of normal probability distributions can have different thickness. Kurtosis measures the thickness of the tails. A mesokurtic distribution is a normal distribution with zero kurtosis. Platykurtic distributions have negative kurtosis and thin tales. Leptokurtic distributions have positive kurtosis and thick tales. See Figure 2.

Figure 2: Mesokurtic, Platykurtic, Leptokurtic Distribution

Meskokurtic, Platykurtic, and Leptokurtic distributions

The mean sets the curve on the horizontal or X-Axis. The standard deviation determines the height of the curve’s peak, shown on the vertical or Y-Axis.

The smaller the standard deviation, the higher and narrower the curve. A normal curve with a standard deviation of zero, which means all values are equal to the mean, is a straight vertical line. Figure 3 shows three normal probability distributions with the same mean, μ, but different standard deviations, σ.

Figure 3: Three Normal Distributions with Equal Means, Unequal Standard Deviations.

Three Normal Probability Distributions with the equal means, but different standard deviations.

Figure 4 shows three normal probability distributions with equal standard deviations, σ, but unequal means, μ.

Figure 4: Three Normal Probability Distributions with Equal Standard Deviations, Unequal Means

Three Normal Probability Distributions with the unequal mean, and equal standard deviations.

Figure 5 shows three normal probability distributions with unequal means and unequal standard deviations.

Figure 5: Three Normal Probability Distributions with Unequal Means and Standard Deviations

Three Normal Probability Distributions with the unequal mean, and unequal standard deviations.

There are an infinite variety of normal curves because distributions can have an infinite number of means and standard deviations.

In Chapter 5, we also presented the Empirical or Normal Rule, 68.26 percent of the data are located plus or minus one standard deviation from the mean, 95.44 percent of the data are plus or minus two standard deviations from the mean, and 99.74 percent of the data are plus or minus 3 standard deviations from the mean. There are 0.13 percent of the data above three standard deviations from the mean and 0.13 percent of the data below three standard deviations from the mean. How do we know this? How is it possible, given the fact that there are an unlimited number of normal probability distributions? These questions will be answered in this chapter.

Figure 6: Normal Curve and the Empirical Rule

After completing this chapter, you will:

• Know the characteristics of the normal probability distribution.

• Understand the reasoning behind the standard normal distribution.

• Be able to define standard normal deviates or z-values.

• Calculate z-values using paper and pencil, a hand-held calculator, or Microsoft Excel.

• Be able to find the probabilities represented by z-values using the Area Under the Curve.

• Determine the probability of an observation between two points using z-values.

• Determine the probability that an observation will be above or below a given z-value.

• Deepen your understanding of the Empirical or Normal Rule.

• Be able to find the value of an unknown random variable, X.

You should download the following files that accompany this chapter:

1. Chapter09_Examples.xlsx

2. Chapter09_Exercises.xlsx

3. Chapter09_NormalCurve_ShadedArea.xlsx

4. Chapter09_SolvingForX.xlsx

5. Chapter_z-Values_AreaBetweenMean&X.xlsx (Area Under the Curve Table)

II. What is a Standard Normal Probability Distribution?

When working with normal curves, we seek to find the probability for a value under the normal curve. This idea will be explained shortly. We cannot calculate the area under the curve for an unlimited number of normal curves, therefore, we create a standardized normal curve to find the area under the curve. We will use a table called “The Area Under the Curve” to find the proportion of the curve between a random value, X, and the mean, μ. You can find this table in pdf and Excel formats in Appendix 2: Statistical Tables.

The standard normal distribution is very simple. Instead of an unlimited number of means and standard deviations, the standard normal distribution is set with a population mean, μ, of zero, and a population standard deviation, σ, of 1. Based on these parameters, a standardized measure for how much the data vary from the mean can be created.

III. What are Standard Normal Deviates or z-values?

When we work with standard normal distributions, we calculate standardized scores. These scores have many names: z-values, z-scores, z-statistics, standard normal deviates, normal deviates, or standard normal values. The name “normal deviate” is not an oxymoron. A normal deviate, a z-value, is the distance between any random value from the mean. To keep things simple, we will call these standardized measures z-values. The zvalue is a standardized measure in units of standard deviation. This measure requires quantitative data; which is to say, interval or ratio data. By having a standardized measure we can compare data from two or more data sets.

The formula for z-values is very simple. It is the signed distance between a random value, X, minus the mean, μ, divided by the standard deviation, σ. By signed distance, we mean that z-values be can either positive or negative. A positive z-value means that the random value is larger than the mean. When we graph the data, this variable will be to the right of the mean. With a negative z-value, the random value is to the left of the mean. Equation 1 shows the formula for calculating z-values:

Equation 1: Formula for z-values

Where: X = The random variable;

μ = The population mean;

σ = The population standard deviation

Please note: This formula for z will be modified for samples and proportions in later chapters.

One benefit of using a standardized score is that you can compare z-values for data measured on different quantitative scales. Let’s compare the z-values for a student’s scores on two widely different tests: The 2019 SAT exam and the test results for a professor’s first statistics examination:

Table 1: A Student’s SAT Score and Results on the First Statistics Exam

*Source: National Center for Education Statistics, SAT mean scores, standard deviations, and score ranges for high school seniors, and percentage of the graduating class taking the SAT, by state: 2017. Table 226.40. https://nces.ed.cov/program/digest/d17/tables/dt17_226.40.asp

**Fictional Data

This student’s scores on the SAT exam and the first statistics exam have identical z-values, 1.50. What does this mean? Both scores are one-and-a-half standard deviations above their respective mean. We can now convert these z-values to probabilities under-the-normal curve. This will tell us the percentage of scores above a z or 1.50 and the percentage below a z of 1.50.

IV. Area Under the Curve Table

We can convert z-values to probabilities two ways:

1. Using the Area Under the Curve Table;

2. Using Excel’s =NORMSDIST function. We will deal the Area Under the Curve Table first.

Figure 7 shows the Area Under the Curve Table. As previously stated you can get a pdf or Excel version of this table by going to Appendix 2. You will find it very helpful to print this table as you read this chapter. The Excel version of this table is also available in the file titled: Chapter_z-Values_AreaBetweenMean&X.xlsx.

Figure 7: Area Under the Curve Table

It may take some time for you to get acquainted with the Area Under the Curve table, but this time would be well spent. The first thing you must know is this table shows the probabilities for areas under the curve for only the right side of the curve. The z-values are worked out to two digits past the decimal point, or to the hundredths column. To find the probability for the z-value, look in the first column, which is labeled z. The values go from 0.0 up to 3.4 even though z-values can be 3.50 or higher. This gives the value of z for one digit to the left of the decimal point and one digit to the right. There are ten columns to the right of the “z” column. These columns are for the second digit to the right of the decimal point going from 0.00 up to 0.09. When working with paper tables, the convention is to report z-values to two decimal places. It is 1.50, not 1.5 or 1.49744, which is the actual z-value for our student’s SAT score. Two decimal places give us sufficient precision. When using Excel, we will often report z-values with three decimal places passed the decimal point.

To find the probability for a z-value of 1.50, first locate the row under the “z” column that is 1.5, then find the column that matches the second digit passed the decimal point for our value, 0. This is the 0.00 column. The intersection of the row 1.5 and column 0.00 will be the probability, which is 0.4332. This means that a z-value on 1.50 is 43.32 percent above the mean, as shown in Figure 8.

Figure 8: Normal Curve with a z-value of 1.50

Once we know that the z-value of 1.50 represents 0.4332, or 43.32 percent, of the area to the right of the mean, we can draw some conclusions. Using the special rule of addition, we know that 0.9332, 93.32 percent, of the test scores on the SATs and statistics exam were lower than this student’s scores found by 0.5000 + 0.4332 = 0.9332. Using the complement rule, we know that 0.0668, 6.68 percent, of the scores on the SATs and statistics exam were higher than this student’s scores, found by 1 – 0.9332 = 0.0668 or 0.5000 – 0.4332 = 0.0668. Given that the Area Under the Curve Table only shows probabilities of the right half of the normal curve, we can execute the complement rule by subtracting the probabilities found using this table from 0.5000.

What do we do if our z-values are negative; which is to say, below the mean? Because the normal distribution is symmetrical, we can still use the Area Under the Curve Table, which only shows z-values equal or greater than zero. A z-value of -1.50, would still have a probability of 0.4332, but it would represent 43.32 percent of the curve below the mean. Here is what a normal curve with z-value -1.50 looks like.

Figure 9: Normal Curve With a z-value of -1.50

Visualize the Standard Normal Curve: You will find it very helpful to visualize the area you are looking for. Draw a normal curve by hand. Include a center line for the mean. The mean always has a z-value of zero. Then place the z-value or values you seek in their appropriate places. In Appendix 2, you can find blank normal curves in z_Curves.pdf. You can also use the Excel file, Chapter09_SolvingForX.xlsx.

Let’s look at another student’s performance on the SATs and the statistics exam. SAT scores are interval level data. The combined scores for the SATs range from 400 to 1600. The statistics exam scores are ratio data with possible scores ranging from zero to 100.

Table 2: A Student’s SAT Score and Results on the First Statistics Exam

**Fictional Data

This student’s SAT scores have a z-value of -0.50. To find the probability of a z-value of -0.50, drop the negative sign, then find the row 0.5 and the column 0.00. The intersection of this row and column indicates that the probability is 0.1915 or 19.15 percent. Remember: while z-values can be negative, the probabilities are never negative. A z-value of -0.50 means that it represents 19.15 percent of the curve below the mean. The student’s z-value for the statistics exam was 2.50, which represents 49.38 percent of the curve above the mean. Figure 10 shows a normal curve that shows the relationship between this student’s SAT and statistics exam scores. The area between -0.50 and 2.50, represents 68.53 percent of the curve, found by using the special rule of addition:

Equation 2: Using the Special Rule of Addition

P(-0.50) + P(2.50) = 0.1915 + 0.4938 = 0.6853.

Figure 10: Normal Curve - Area Between -0.50 and 2.50

Let’s look at a third student’s performance on the SATs and the statistics exam:

Table 3: A Student’s SAT Score and Results on the First Statistics Exam

**Fictional Data

This student’s SAT scores have a z-value of 1.00. To find the probability of a z-value of 1.00, find the row 1.0 and the column 0.00. The intersection of the row and column indicates that the probability is 0.3413 or 34.13 percent. The student’s z-value for the statistics exam was 2.00, which represents 47.72 percent of the curve above the mean. Here is a normal curve that shows the relationship between this student’s SAT and statistics exam scores. The area between 1.00 and 2.00 represents 13.59 percent of the curve, found to be 0.4772 - 0.3413. Figure 11 shows a normal curve of the area between the z-values of 1.00 and 2.00.

Figure 11: Normal Curve With the Area Between z 1.00 and 2.00 Shaded

Figure 12 below shows how to use Excel to calculate the area under the curve using the z-values for the SAT and statistics exams scores. To calculate the z-values, the formula is =(X – Mean)/Standard Deviation. Once you have the z-values, use the ABS and NORMSDIST functions to find the area under the curve. The NORMSDIST function returns the probability that a value will be less than or equal to z. But there is a small hitch. Excel calculates this probability starting from the left-most extreme of the normal curve. The Area Under the Curve Table shows the probabilities from the mean. Equation 3 shows the workaround:

Equation 3: Excel Workaround for Finding the Area Under the Curve

=ABS(NORMSDIST(z-value)-0.5).

The argument, z-value can be the actual z-value or the cell reference for the cell with the z-value. Subtracting 0.5 gets Excel to return the same value as the Area Under the Curve Table. Please note: Sometimes the value Excel reports will be slightly different than the Area Under the Curve Table. This is because Excel is more accurate than the table.

The NORMSDIST function is embedded in the ABS function. ABS returns the absolute value of the NORMSDIST function. When the z-value is negative, Excel will return a negative probability. Yes, Excel can be as dumb as a hammer. As you know, probabilities must be greater than or equal to zero. There are never any negative probabilities. The ABS function is used to keep Excel from reporting a negative probability, which is not valid. This workbook is attached under the name Chapter09_AreaUnderCurve.xlxs.

Figure 12: Using Excel to Calculate the Area Under the Curve

Using Excel to calculate the area under the curve

Remember: Negative z-values are to the left of the mean. You must also remember the following points when calculating the area under the curve: First find the probabilities for the z-values and then perform the required arithmetic on the probabilities, not on the z-values. It also helps to visualize the areas you are looking for by drawing the normal curve.

Let’s do a few problems:

1. To be accepted into an honors society, a student must have a GPA in the top 10 percent of all the students at the school. A student’s GPA is 3.20. Will this student be admitted to the honors society? The μ = 2.78 and the σ = 0.33.

Answer: The z-value for the top 10 percent must be at least 1.28. Please note: 1.28 is not exact, it represents 39.97 percent of the area above the mean leaving 10.03 percent in the tail, not 10 percent. This z-value is the closest we can get to 10 percent in the tail using the Area Under the Curve Table. Figure 13 shows an abridged Area Under the Curve Table with probabilities for z-values close to 1.28 along with the normal curve.

Figure 13: Abridged Area Under the Curve Table for z-values close to 1.28 and a Normal Curve with z = 1.28

The z-value for a GPA of 3.20 is 1.27 found by:

Equation 4: z-value for a 3.20 GPA

With a z-value of 1.27, you are in the top 10.2 percent, found by 0.5000 – 0.3980. This student just missed getting into the top 10 percent and the honors society.

2. If a student has a GPA in the bottom 1 percent, he or she will lose financial aid. The
μ = 2.78 and the σ = 0.33. What is the z-value for the bottom 1 percent? Would a student with a 2.05 GPA lose financial aid?

Answer: Figure 14 shows an abridged Area Under the Curve Table with probabilities close to 2.33 along with a normal curve. The closest value we can get to the bottom 1 percent is -2.33, found by 0.5000 – 0.4901 = 0.0099. Remember: Our z-value must be negative because we are looking for the cut-off point for the bottom 1 percent and the top 99 percent. The probability of having a value this low, however, cannot be a negative number. See Figure 14.

Figure 14: Abridged Area Under the Curve Table for z-values Close to -2.33 along with a Normal Curve

The z-value for a GPA of 2.00 is -2.21 found by:

Equation 5: z-value for a 2.00 GPA

A student with a -2.21 is in the bottom 1.36 percent, found by 1.0000 – 0.9864 or 0.5000 – 0.4864. This student would not lose financial aid.

3. If a student has a GPA in the top 0.5 percent, he or she will win a $5,000 scholarship. A student has a 3.65 GPA. Does this student win the scholarship? Assume the μ = 2.78 and the σ = 0.33. Figure 15 shows the Area Under the Curve Table and a normal curve.

Answer: The closest z-values to 49.5 percent above the mean is 2.58, which is 49.51 percent above the mean with 0.0049 in the tail.

Figure 15: Abridged Area Under the Curve Table for z-values Close to 2.58 and a Norman Curve with z = 2.58

The z-value for a GPA of 3.65 is:

Equation 6: z-value for a 3.65 GPA

With a z-value of 2.64, this student is among the top 0.50 percent. In fact, he or she is among the top 0.41 percent, found by 1.000 – 0.9954 = 0.0041, or 0.5000 – 0.4959 = 0.0041. This student has earned the $2,500 prize.

V. The Empirical or Normal Rule

In Chapter 5, we introduced the empirical or normal rule. This rule deals with the distribution of the data under a normal curve. Here is what the empirical rule states when we use z-values:

1. The area between the z-values of -1.00 and 1.00 represents approximately 68.26 percent of the data curve.

2. The area between the z-values of -2.00 and 2.00 represents approximately 95.44 percent of the data.

3. The area between the z-values of -3.00 and 3.00 represents approximately 99.74 percent of the data.

4. The area below -3.00 represents 0.13 percent of the data.

5. The area above 3.00 represents 0.13 percent of the data.

6. The z-value for the mean is always 0.00.

Figure 16 shows the normal rule using z-values.

Figure 16: The Empirical or Normal Rule Using z-Values

How do we know that the center 68.26 percent of a normal curve is plus or minus one standard deviation from the mean, or between the z-values of -1.00 and 1.00? Remember: z-values are units of standard deviation. We repeat this process to find the areas between -2.00 and 2.00 and -3.00 and 3.00. Figure 17 shows the probabilities for z-values of 1.00, 2.00, and 3.00.

Figure 17: Abridged Area Under the Curve Table

Please note: The numbers in the Area Under the Curve Table are rounded off to four digits past the decimal point. Excel calculates the z-value probabilities to 15 digits past the decimal point: 0.341344746068543. Calculating z-values with Excel is more precise than using the Area Under the Curve Table. If your numbers are slightly different when you use Excel it is because Excel is more precise than the Area Under the Curve Table.

Solving for X

The formula for finding the z-value can be modified to find the value for X. See Table 4:

Table 4: Formula for Finding X

What SAT score does a person need to be in the top 0.5 percent? A national study of SAT scores showed that the population mean was 1,060 and the population standard deviation was 195. As shown in Figure 18, the z-value for the top 0.5 percent is 2.58.

Figure 18: z-value for the Top 0.5% = 2.58

Equation 7 shows that to be in the top 0.5 percent a student would need an SAT score of 1,563.1 or higher.

Equation 7: An SAT score of 1,563.1 or Higher Would Place a Student in the top 0.5%

What SAT score does a person need to be in the top 5 percent? As shown in Figure 19, the z-value for the top 5 percent is 1.65.

Figure 19: z-value for the Top 5% = 1.65

Equation 8 shows that to be in the top 5 percent a student would need an SAT score of 1,381.75 or higher.

Equation 8: An SAT score of 1,381.75 or Higher Would Place a Student in the top 5%

What SAT score does a person need to be in the bottom 10 percent? As shown in Figure 20, the z-value for the top 10 percent is 1.28, so the z-value bottom 10 percent is 1.28.

Figure 20: z-value for the Bottom 10% = -1.28

Equation 9 shows that to be in the bottom 10 percent a student would need an SAT score of 810.4 or lower.

Equation 9: An SAT score of 810.4 or Lower Would Place a Student in the bottom 10%

The Excel workbook titled Chapter09_SolvingForX.xls provides a more precise answer because Excel’s z-values are more precise than the Area Under the Curve Table. Excel reports the value for X as 810.097 as shown in Figure 21.

Figure 21: An SAT score of 810.097 or Lower Would Place a Student in the bottom 10%

Solving for X using Exce, 09_SolvingForX.xlsx.

VII. Summary

In Chapter 9, we introduced z-values and the Area Under the Curve Table. You found out how to calculate z-values using paper and pencil, a handheld calculator, and Microsoft Excel. You also were showed how to find the areas under the curve using the Area Under the Curve Table and Microsoft Excel. Using these techniques, we verified the empirical rule. We also reviewed how to solve for X using paper and pencil and Microsoft Excel.

In the following chapters, we will:

1. Use z-values for samples, Chapter 10.

2. Use z-values to construct confidence intervals, Chapter 11.

3. Use z-values to estimate sample size, Chapter 12.

4. Use z-values to conduct Null Hypothesis Significance Tests, Chapters 14 and 15.

VIII. Exercises

Complete the following exercises:

Exercise 1: Solving for X

According the College Entrance Examination Board, in 2017 the average SAT score in New York State was 1,052 with a standard deviation of 188.

Table 5: New York State SAT Scores, 2017

1. What SAT score do you need to be among the top 0.5 percent?

2. What SAT score do you need to be among the top 2.5 percent?

3. What SAT score do you need to be among the bottom 10 percent?

4. What SAT score do you need to be among the bottom 1 percent?

Solve these problems by using the Area Under the Curve Table and a handheld calculator. Then answer these questions using the Excel file, Chapter09_SolvingForX.xlsx.

Exercise 2: Finding the Area Under the Curve

In Major League Baseball, the average fastball pitch is 90 miles per hour, μ = 90, with a standard deviation of 3 miles per hour, σ = 3.

1. A pitcher’s fastball is 96 miles per hour. Is he among the fastest 1 percent of pitchers?

2. A pitcher’s fastball is 95 miles per hour. Is he among the fastest 5 percent of pitchers?

3. A pitcher’s fastball is 85 miles per hour. Is he among the slowest 2.5 percent of pitchers?

3. A pitcher’s fastball is 80 miles per hour. Is he among the slowest 1 percent of pitchers?

Solve these problems by using the Area Under the Curve Table and a handheld calculator. Then answer these questions using the Excel file, Chapter09_SolvingForX.xlsx.

Exercise 3: Empirical Rule:

1. How much of the curve is between z-values of 1.00 and 2.00 or -2.00 and -1.00?

2. How much of the curve is between z-values of 2.00 and 3.00 or -3.00 and -2.00

Use the Area Under the Curve table to answer these questions.

Except where otherwise noted, Clear-Sighted Statistics is licensed under a
Creative Commons License. You are free to share derivatives of this work for
non-commercial purposes only. Please attribute this work to Edward Volchok.

Show the following:

Adjust appearance:

Notes

Annotate