Articles

11.9: The Empirical Rule - Mathematics

11.9: The Empirical Rule - Mathematics


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Objective

Here you will learn how to use the Empirical Rule to estimate the probability of an event.

If the price per pound of USDA Choice Beef is normally distributed with a mean of $4.85/lb and a standard deviation of $0.35/lb, what is the estimated probability that a randomly chosen sample (from a randomly chosen market) will be between $5.20 and $5.55 per pound?

Guidance

This reading on the Empirical Rule is an extension of the previous reading “Understanding the Normal Distribution.” In the prior reading, the goal was to develop an intuition of the interaction between decreased probability and increased distance from the mean. In this reading, we will practice applying the Empirical Rule to estimate the specific probability of occurrence of a sample based on the range of the sample, measured in standard deviations.

The graphic below is a representation of the Empirical Rule:

The graphic is a rather concise summary of the vital statistics of a Normal Distribution. Note how the graph resembles a bell? Now you know why the normal distribution is also called a “ bell curve.”

  • 50% of the data is above, and 50% below, the mean of the data
  • Approximately 68% of the data occurs within 1 SD of the mean
  • Approximately 95% occurs within 2 SD’s of the mean
  • Approximately 99.7% of the data occurs within 3 SDs of the mean

It is due to the probabilities associated with 1, 2, and 3 SDs that the Empirical Rule is also known as the 68−95−99.7 rule.

Example 1

If the diameter of a basketball is normally distributed, with a mean (µ) of 9″, and a standard deviation (σ) of 0.5″, what is the probability that a randomly chosen basketball will have a diameter between 9.5″ and 10.5″?

Solution

Since the σ = 0.5″ and the µ = 9″, we are evaluating the probability that a randomly chosen ball will have a diameter between 1 and 3 standard deviations above the mean. The graphic below shows the portion of the normal distribution included between 1 and 3 SDs:

The percentage of the data spanning the 2nd and 3rd SDs is 13.5% + 2.35% = 15.85%

The probability that a randomly chosen basketball will have a diameter between 9.5 and 10.5 inches is 15.85%.

Example 2

If the depth of the snow in my yard is normally distributed, with µ = 2.5″ and σ = .25″, what is the probability that a randomly chosen location will have a snow depth between 2.25 and 2.75 inches?

Solution

2.25 inches is µ − 1σ, and 2.75 inches is µ + 1σ, so the area encompassed approximately represents 34% + 34% = 68%.

The probability that a randomly chosen location will have a depth between 2.25 and 2.75 inches is 68%.

Example 3

If the height of women in the United States is normally distributed with µ = 5′ 8″ and σ = 1.5″, what is the probability that a randomly chosen woman in the United States is shorter than 5′ 5″?

Solution

This one is slightly different, since we aren’t looking for the probability of a limited range of values. We want to evaluate the probability of a value occurring anywhere below 5′ 5″. Since the domain of a normal distribution is infinite, we can’t actually state the probability of the portion of the distribution on “that end” because it has no “end”! What we need to do is add up the probabilities that we do know and subtract them from 100% to get the remainder.

Here is that normal distribution graphic again, with the height data inserted:

Recall that a normal distribution always has 50% of the data on each side of the mean. That indicates that 50% of US females are taller than 5′ 8″, and gives us a solid starting point to calculate from. There is another 34% between 5′ 6.5″ and 5′ 8″ and a final 13.5% between 5′ 5″ and 5′ 6.5″. Ultimately that totals: 50% + 34% + 13.5% = 97.5%. Since 97.5% of US females are 5′ 5″ or taller, that leaves 2.5% that are less than 5′ 5″ tall.

Intro Problem Revisited

If the price per pound of USDA Choice Beef is normally distributed with a mean of $4.85/lb and a standard deviation of $0.35/lb, what is the estimated probability that a randomly chosen sample (from a randomly chosen market) will be between $5.20 and $5.55 per pound?

$5.20 is µ + 1σ, and $5.55 is µ + 2σ, so the probability of a value occurring in that range is approximately 13.5%.

Vocabulary

Normal distribution: a common, but specific, distribution of data with a set of characteristics detailed in the lesson above.

Empirical Rule: a name for the way in which the normal distribution divides data by standard deviations: 68% within 1 SD, 95% within 2 SDs and 99.7 within 3 SDs of the mean

68-95-99.7 rule: another name for the Empirical Rule

Bell curve: the shape of a normal distribution

Guided Practice

  1. A normally distributed data set has µ = 10 and σ = 2.5, what is the probability of randomly selecting a value greater than 17.5 from the set?
  2. A normally distributed data set has µ = .05 and σ = .01, what is the probability of randomly choosing a value between .05 and .07 from the set?
  3. A normally distributed data set has µ = 514 and an unknown standard deviation, what is the probability that a randomly selected value will be less than 514?

Solutions

  1. If µ = 10 and σ = 2.5, then 17.5 = µ + 3σ. Since we are looking for all data above that point, we need to subtract the probability that a value will occur below that value from 100%: The probability that a value will be less than 10 is 50%, since 10 is the mean. There is another 34% between 10 and 12.5, another 13.5% between 12.5 and 15, and a final 2.35% between 15 and 17.5. 100% −50% −34% −13.5% −2.35% = 0.15% probability of a value greater than 17.5
  2. 0.05 is the mean, and 0.07 is 2 standard deviations above the mean, so the probability of a value in that range is 34% + 13.5% = 47.5%
  3. 514 is the mean, so the probability of a value less than that is 50%.

Practice Questions

Assume all distributions to be normal or approximately normal, and calculate percentages using the 68−95−99.7 rule.

  1. Given mean 63 and standard deviation of 168, find the approximate percentage of the distribution that lies between −105 and 567.
  2. Approximately what percent of a normal distribution is between 2 standard deviations and 3 standard deviations from the mean?
  3. Given standard deviation of 74 and mean of 124, approximately what percentage of the values are greater than 198?
  4. Given σ = 39 and µ = 101, approximately what percentage of the values are less than 23?
  5. Given mean 92 and standard deviation 189, find the approximate percentage of the distribution that lies between −286 and 470.
  6. Approximately what percent of a normal distribution lies between µ + 1σ and µ + 2σ?
  7. Given standard deviation of 113 and mean 81, approximately what percentage of the values are less than −145?
  8. Given mean 23 and standard deviation 157, find the approximate percentage of the distribution that lies between 23 and 337.
  9. Given σ = 3 and µ = 84, approximately what percentage of the values are greater than 90?
  10. Approximately what percent of a normal distribution is between µ and µ+1σ?
  11. Given mean 118 and standard deviation 145, find the approximate percentage of the distribution that lies between −27 and 118.
  12. Given standard deviation of 81 and mean 67, approximately what percentage of values are greater than 310?
  13. Approximately what percent of a normal distribution is less than 2 standard deviations from the mean?
  14. Given µ + 1σ = 247 and µ + 2σ = 428, find the approximate percentage of the distribution that lies between 66 and 428.
  15. Given µ − 1σ = −131 and µ + 1σ = 233, approximately what percentage of the values are greater than −495?

  • An estimated 68% of the data within the set is positioned within one standard deviation of the mean i.e., 68% lies within the range [M - SD, M + SD].
  • An estimated 95% of the data within the set is positioned within two standard deviations of the mean i.e., 95% lies within the range [M - 2SD, M + 2SD].
  • An estimated 97.7% of the data within the set is positioned within three standard deviations of the mean i.e., 99.7% lies within the range [M - 3SD, M + 3SD].

Let's say the scores of an exam follow a bell-shaped distribution that has a mean of 100 and a standard deviation of 16. What percentage of the people who completed the exam achieved a score between 68 and 132?

Solution: 132 – 100 = 32, which is 2(16). As such, 132 is 2 standard deviations to the right of the mean. 100 – 68 = 32, which is 2(16). This means that a score of 68 is 2 standard deviations to the left of the mean. Since 68 to 132 is within 2 standard deviations of the mean, 95% of the exam participants achieved a score of between 68 and 132.

You may also be interested in our Z-Score Calculator or/and P-Value Calculator


Empirical rule

Empirical Rule
Probability theory and statistics are the main and very important branches of mathematics. The former deals with the chances of event to be happen while the latter is concerned with the vast numerical data and various calculations on it.

Empirical Rule: If a data set is approximately normally distributed (bell-shaped), then
about 68% of the data will fall within 1 standard deviation of the mean
about 95% of the data will fall within 2 standard deviations of the mean
about 99.7% of the data will fall within 3 standard deviations of the mean .

Empirical Rule (p 76): For data with a symmetric bell-shaped distribution, about 68% of data lies within 1 standard deviation of the mean, about 95% lies within 2 standard deviations of the mean, and about 99.7% lies within 3 standard deviations of the mean.

If a distribution is roughly bell-shaped, then
Approximately 68% of the data will lie within 1 standard deviation of the mean.
Approximately 95% of the data will lie within 2 standard deviations of the mean.
Approximately 99.7% of the data will lie within 3 standard deviations of the mean.

: Suppose the histogram of the data is symmetric around the vertical line x = x as follows:
In other words, the histogram should fit into a bell-shaped curve.
Bell-shaped Curve .

Collectively, these points are known as the

or the 68-95-99.7 rule. Clearly, given a normal distribution, most outcomes will be within 3 standard deviations of the mean.

of thumb based on Duane plots made from many reliability improvement tests across many industries is the following:
Duane plot reliability growth slopes should lie between 0.3 and 0.6 .

for Mean, Median, and Mode
How to Measure Central Tendency using Mean, Median, or Mode
7 Steps to Creating a Histogram .

of thumb to use a minimum order of . The disadvantage of a large order is that many parameters have to be estimated under restrictions. The restrictions can be categorized as conditions for stationarity and the strictly positive parameters.

Standard deviation follows the

. In a nutshell, the rule states that one standard deviation away from the mean (in either direction) will encompase 68% of the data, two deviations in either direction will encompase 95% of the data, and three deviations away from the mean will encompase 99.

In practical usage, in contrast to the

, which applies to normal distributions, under Chebyshev's inequality a minimum of just 75% of values must lie within two standard deviations of the mean and 89% within three standard deviations.[1][2] .

Gave the sum of a series whose terms are squares of an arithmetical progression, and gave

- check the percent of data that falls within 1, 2 and 3 SDs from the mean (should be approximately 68%, 95% and 99.7%).
Or we can do a Quantile-Quantile Probability plot comparing the quantiles of the data against their Normal distribution counterparts.

The three sigma rule states that, in a normal distribution, almost all the values remain within three standard deviations of the mean. Three sigma rule is also known as


Respond to this Question

Statistic

Heights of adult men have a mean of 69.0 inches and a standard deviation of 2.8 inches. Approximately what percentage of adult men have a height between 66.2 and 77.4 inches? Must show the number and the empirical rule

Scores of an IQ test have a bell shaped distribution with a mean of 100 and a standard deviation of 19. Use the empirical rule to determine the following. a.) What % of people has an IQ score between 81 and 119?

Statistics

Suppose that IQ scores have a bell-shaped distribution with a mean of 95 95 and a standard deviation of 18 18 . Using the empirical rule, what percentage of IQ scores are at least 149 149 ? Please do not round your answer.

Statistics

"suppose that IQ scores have a bell-shaped distribution with a mean of 97 and standard deviation of 17.Using empirical rule, what percentage of IQ score are less than 46?"

Statistics

Working of a problem--I have found the mean, variance and the standard deviation from the following info: The probability that a cellular phone company Kiosk sells X number of new phone contracts per day is shown below---- X=4, 5,

Statistics

Suppose that IQ scores have a bell-shaped distribution with a mean of 99 and a standard deviation of 12. Using the empirical rule, what percentage of IQ scores are less than 87? Please do not round your answer. 87-99 = -12/12 = -1

AP Stats

The distribution of heights of adult American men is approximately normal with mean 69 inches and standard deviation 2.5 inches. Use the 68-95-99.7 rule to answer the following questions: (d) A height of 71.5 inches corresponds to

Algebra

Use a calculator to find the mean and standard deviation of the data. Round to the nearest tenth. 6,7,19,7,18,7 A. mean = 9 standard deviation = 26.4 B. mean = 11.9 standard deviation = 26.4 C. mean = 10.6 standard deviation =

Elementary Statistics

Heights of women have a bell-shaped distribution with a mean of 161 cm and a standard deviation of 7 cm. Using Chebyshev’s theorem, what do we know about the percentage of women with heights that are within 2 standard deviations

Algebra

The mean on a Advanced Algebra test was 78 with a standard deviation of 8. If the test scores are normal distributed, find the interval about the mean that contains 99.7% of the scores. Use the empirical rule.

Algebra

The mean on a Advanced Algebra test was 78 with a standard deviation of 8. If the test scores are normal distributed, find the interval about the mean that contains 99.7% of the scores. Use the empirical rule.

Statistics

Heights of women have a bell-shaped distribution with a mean of 158 cm and a standard deviation of 8 cm. Using Chebyshev’s theorem, what do we know about the percentage of Women with heights that are within 3 standard deviations


Chebyshev’s Theorem

The Empirical Rule does not apply to all data sets, only to those that are bell-shaped, and even then is stated in terms of approximations. A result that applies to every data set is known as Chebyshev’s Theorem.

Chebyshev’s Theorem

For any numerical data set,

  1. at least 3/4 of the data lie within two standard deviations of the mean, that is, in the interval with endpoints x - ± 2 s for samples and with endpoints μ ± 2 σ for populations
  2. at least 8/9 of the data lie within three standard deviations of the mean, that is, in the interval with endpoints x - ± 3 s for samples and with endpoints μ ± 3 σ for populations
  3. at least 1 − 1 ∕ k 2 of the data lie within k standard deviations of the mean, that is, in the interval with endpoints x - ± k s for samples and with endpoints μ ± k σ for populations, where k is any positive whole number that is greater than 1.

Figure 2.19 "Chebyshev’s Theorem" gives a visual illustration of Chebyshev’s Theorem.

Figure 2.19 Chebyshev’s Theorem

It is important to pay careful attention to the words “at least” at the beginning of each of the three parts. The theorem gives the minimum proportion of the data which must lie within a given number of standard deviations of the mean the true proportions found within the indicated regions could be greater than what the theorem guarantees.

Example 21

A sample of size n = 50 has mean x - = 28 and standard deviation s = 3. Without knowing anything else about the sample, what can be said about the number of observations that lie in the interval (22,34)? What can be said about the number of observations that lie outside that interval?

The interval (22,34) is the one that is formed by adding and subtracting two standard deviations from the mean. By Chebyshev’s Theorem, at least 3/4 of the data are within this interval. Since 3/4 of 50 is 37.5, this means that at least 37.5 observations are in the interval. But one cannot take a fractional observation, so we conclude that at least 38 observations must lie inside the interval (22,34).

If at least 3/4 of the observations are in the interval, then at most 1/4 of them are outside it. Since 1/4 of 50 is 12.5, at most 12.5 observations are outside the interval. Since again a fraction of an observation is impossible, x (22,34).

Example 22

The number of vehicles passing through a busy intersection between 8:00 a.m. and 10:00 a.m. was observed and recorded on every weekday morning of the last year. The data set contains n = 251 numbers. The sample mean is x - = 725 and the sample standard deviation is s = 25. Identify which of the following statements must be true.

  1. On approximately 95% of the weekday mornings last year the number of vehicles passing through the intersection from 8:00 a.m. to 10:00 a.m. was between 675 and 775.
  2. On at least 75% of the weekday mornings last year the number of vehicles passing through the intersection from 8:00 a.m. to 10:00 a.m. was between 675 and 775.
  3. On at least 189 weekday mornings last year the number of vehicles passing through the intersection from 8:00 a.m. to 10:00 a.m. was between 675 and 775.
  4. On at most 25% of the weekday mornings last year the number of vehicles passing through the intersection from 8:00 a.m. to 10:00 a.m. was either less than 675 or greater than 775.
  5. On at most 12.5% of the weekday mornings last year the number of vehicles passing through the intersection from 8:00 a.m. to 10:00 a.m. was less than 675.
  6. On at most 25% of the weekday mornings last year the number of vehicles passing through the intersection from 8:00 a.m. to 10:00 a.m. was less than 675.
  1. Since it is not stated that the relative frequency histogram of the data is bell-shaped, the Empirical Rule does not apply. Statement (1) is based on the Empirical Rule and therefore it might not be correct.
  2. Statement (2) is a direct application of part (1) of Chebyshev’s Theorem because ( x - − 2 s , x - + 2 s ) = ( 675,775 ) . It must be correct.
  3. Statement (3) says the same thing as statement (2) because 75% of 251 is 188.25, so the minimum whole number of observations in this interval is 189. Thus statement (3) is definitely correct.
  4. Statement (4) says the same thing as statement (2) but in different words, and therefore is definitely correct.
  5. Statement (4), which is definitely correct, states that at most 25% of the time either fewer than 675 or more than 775 vehicles passed through the intersection. Statement (5) says that half of that 25% corresponds to days of light traffic. This would be correct if the relative frequency histogram of the data were known to be symmetric. But this is not stated perhaps all of the observations outside the interval (675,775) are less than 75. Thus statement (5) might not be correct.
  6. Statement (4) is definitely correct and statement (4) implies statement (6): even if every measurement that is outside the interval (675,775) is less than 675 (which is conceivable, since symmetry is not known to hold), even so at most 25% of all observations are less than 675. Thus statement (6) must definitely be correct.

Key Takeaways

  • The Empirical Rule is an approximation that applies only to data sets with a bell-shaped relative frequency histogram. It estimates the proportion of the measurements that lie within one, two, and three standard deviations of the mean.
  • Chebyshev’s Theorem is a fact that applies to all possible data sets. It describes the minimum proportion of the measurements that lie must within one, two, or more standard deviations of the mean.

Exercises

Basic

Describe the conditions under which the Empirical Rule may be applied.

Describe the conditions under which Chebyshev’s Theorem may be applied.

A sample data set with a bell-shaped distribution has mean x - = 6 and standard deviation s = 2. Find the approximate proportion of observations in the data set that lie:

A population data set with a bell-shaped distribution has mean μ = 6 and standard deviation σ = 2. Find the approximate proportion of observations in the data set that lie:

A population data set with a bell-shaped distribution has mean μ = 2 and standard deviation σ = 1.1. Find the approximate proportion of observations in the data set that lie:

A sample data set with a bell-shaped distribution has mean x - = 2 and standard deviation s = 1.1. Find the approximate proportion of observations in the data set that lie:

A population data set with a bell-shaped distribution and size N = 500 has mean μ = 2 and standard deviation σ = 1.1. Find the approximate number of observations in the data set that lie:

A sample data set with a bell-shaped distribution and size n = 128 has mean x - = 2 and standard deviation s = 1.1. Find the approximate number of observations in the data set that lie:

A sample data set has mean x - = 6 and standard deviation s = 2. Find the minimum proportion of observations in the data set that must lie:

A population data set has mean μ = 2 and standard deviation σ = 1.1. Find the minimum proportion of observations in the data set that must lie:

A population data set of size N = 500 has mean μ = 5.2 and standard deviation σ = 1.1. Find the minimum number of observations in the data set that must lie:

A sample data set of size n = 128 has mean x - = 2 and standard deviation s = 2. Find the minimum number of observations in the data set that must lie:

A sample data set of size n = 30 has mean x - = 6 and standard deviation s = 2.

  1. What is the maximum proportion of observations in the data set that can lie outside the interval (2,10)?
  2. What can be said about the proportion of observations in the data set that are below 2?
  3. What can be said about the proportion of observations in the data set that are above 10?
  4. What can be said about the number of observations in the data set that are above 10?

A population data set has mean μ = 2 and standard deviation σ = 1.1.

  1. What is the maximum proportion of observations in the data set that can lie outside the interval ( − 1 . 3,5 . 3 ) ?
  2. What can be said about the proportion of observations in the data set that are below −1.3?
  3. What can be said about the proportion of observations in the data set that are above 5.3?

Applications

Scores on a final exam taken by 1,200 students have a bell-shaped distribution with mean 72 and standard deviation 9.

  1. What is the median score on the exam?
  2. About how many students scored between 63 and 81?
  3. About how many students scored between 72 and 90?
  4. About how many students scored below 54?

Lengths of fish caught by a commercial fishing boat have a bell-shaped distribution with mean 23 inches and standard deviation 1.5 inches.

  1. About what proportion of all fish caught are between 20 inches and 26 inches long?
  2. About what proportion of all fish caught are between 20 inches and 23 inches long?
  3. About how long is the longest fish caught (only a small fraction of a percent are longer)?

Hockey pucks used in professional hockey games must weigh between 5.5 and 6 ounces. If the weight of pucks manufactured by a particular process is bell-shaped, has mean 5.75 ounces and standard deviation 0.125 ounce, what proportion of the pucks will be usable in professional games?

Hockey pucks used in professional hockey games must weigh between 5.5 and 6 ounces. If the weight of pucks manufactured by a particular process is bell-shaped and has mean 5.75 ounces, how large can the standard deviation be if 99.7% of the pucks are to be usable in professional games?

Speeds of vehicles on a section of highway have a bell-shaped distribution with mean 60 mph and standard deviation 2.5 mph.

  1. If the speed limit is 55 mph, about what proportion of vehicles are speeding?
  2. What is the median speed for vehicles on this highway?
  3. What is the percentile rank of the speed 65 mph?
  4. What speed corresponds to the 16th percentile?

Suppose that, as in the previous exercise, speeds of vehicles on a section of highway have mean 60 mph and standard deviation 2.5 mph, but now the distribution of speeds is unknown.

  1. If the speed limit is 55 mph, at least what proportion of vehicles must speeding?
  2. What can be said about the proportion of vehicles going 65 mph or faster?

An instructor announces to the class that the scores on a recent exam had a bell-shaped distribution with mean 75 and standard deviation 5.

  1. What is the median score?
  2. Approximately what proportion of students in the class scored between 70 and 80?
  3. Approximately what proportion of students in the class scored above 85?
  4. What is the percentile rank of the score 85?

The GPAs of all currently registered students at a large university have a bell-shaped distribution with mean 2.7 and standard deviation 0.6. Students with a GPA below 1.5 are placed on academic probation. Approximately what percentage of currently registered students at the university are on academic probation?

Thirty-six students took an exam on which the average was 80 and the standard deviation was 6. A rumor says that five students had scores 61 or below. Can the rumor be true? Why or why not?

Additional Exercises

x 26 27 28 29 30 31 32 f 3 4 16 12 6 2 1

Σ x = 1,256 and Σ x 2 = 35,926 .

  1. Compute the mean and the standard deviation.
  2. About how many of the measurements does the Empirical Rule predict will be in the interval ( x - − s , x - + s ) , the interval ( x - − 2 s , x - + 2 s ) , and the interval ( x - − 3 s , x - + 3 s ) ?
  3. Compute the number of measurements that are actually in each of the intervals listed in part (a), and compare to the predicted numbers.

A sample of size n = 80 has mean 139 and standard deviation 13, but nothing else is known about it.


In the last section, we talked about a normal distribution, which is a bell-shaped, symmetric curve for normally distributed data, that looks something like this:

I create online courses to help you rock your math class. Read more.

We’ll spend a lot of time working with distributions like this, so let’s talk about some of the most important properties of a normal distribution.

The empirical rule

Normal distributions follow the empirical rule, also called the 68-95-99.7 rule. The rule tells us that, for a normal distribution, there’s a

. 68\%. chance a data point falls within . 1. standard deviation of the mean

. 95\%. chance a data point falls within . 2. standard deviations of the mean

. 99.7\%. chance a data point falls within . 3. standard deviations of the mean

In other words, if we want to show this graphically,

we can show that . 68\%. of the data will fall within . 1. standard deviation of the mean, that within . 2. full standard deviations of the mean we’ll have . 95\%. of the data, and that within . 3. full standard deviations from the mean we’ll have . 97.7\%. of the data.

And we can draw all kinds of conclusions based on this information, and the fact that the all the area under the graph represents . 100\%. of the data. For example, since total area is . 100\%. and the data within three standard deviations is . 99.7\%. that means that we’ll always have . 0.3\%. of the data in a normal distribution that lies outside three standard deviations from the mean. Or if we wanted to know how much of our data will lie between one and two standard deviations from the mean, we can say that it’s . 95\%-68\%=27\%.

Percentile

We look a lot at percentiles within a normal distribution. The nth percentile is the value such that n percent of the values lie below it. In other words, a value in the 95th percentile is greater than . 95\%. of the data. The 50th percentile in a normal distribution always gives the median, and the IQR is always found using the 75th percentile minus the 25th percentile.

Z-scores

A . z. -score tells you the number of standard deviations a point is from the mean. To calculate a . z. -score for normally distributed data (normal distributions) we use the

where . x. is the data point, . mu. is the mean, and . sigma. is the standard deviation.

The . z. -score for a data point is how far it is from the mean, and you always want to give the . z. -score in terms of standard deviations. Therefore, to find the . z. -score at a certain point in the distribution, we use the formula above, taking the data point, subtracting the mean, and then dividing that result by the standard deviation. That gives us a value for . z.

We’ll look up the . z. -score in a . z. -table, which is a table that takes the number of standard deviations and tells you the percentage of the area under the curve up to that point.

Data points that are less than the mean will be to the left of the mean and will have a negative . z. -score. They should be looked up in the table of negative . z. -scores:

Data points that are greater than the mean will be to the right of the mean and will have a positive . z. -score. They should be looked up in the table of positive . z. -scores:

A . z. -score is unusual if it’s further than three standard deviations from the mean. Essentially the . z. -score tells us the percentile rank of the data point that we started with. If the . z. -score for our data point is . 0.7123. it means that the data point is greater than . 71.23\%. of the data, meaning that our data point is in the . 71.23. percentile.

Remember, the . z. -table always gives you the percentage of data that’s below your data point. Therefore, to find the percentage of data above your data point, you have to take . 1. minus the value from the table.

Thresholds

Sometimes we want to know the threshold, or cutoff, in our data set. In other words, we might want to know “What’s the minimum value needed in order to be in the “top . 10\%. ” of the data?

In order to figure this out, we need to work backwards starting from the . z. -table. For example, if we want to find the top . 30\%. of the data, we’d use the . z. -table to find the first . z. -score that’s just barely above . 70\%. or . 0.7000. Then we’ll look at the row and column headers that correspond with a . z. -table value of . 0.7000. The decimal number given by the row and column headers tells us how many standard deviations above the mean we need to be in order to be above . 70\%. or, in the top . 30\%.

If we multiply that decimal number by the standard deviation, and then add the result to the mean, that will tell us the value that’s at the bottom of the top . 30\%. If instead we were looking up the “bottom . 40\%. ” in the . z. -table, we’d need to look for the . z. -table value that’s just under . 0.4000.


DEVRY MATH399 Week 1 Assignment Introduction to the Empirical Rule and Chebyshev’s Theorem Latest 2019 JULY

Question At
a carnival, contestants are asked to keep rolling a pair of dice until they
roll snake eyes. The number of rolls
needed has a mean of 36 rolls, with a standard deviation of 5.4 rolls. The
distribution of the number of rolls needed is not assumed to be symmetric.

Between
what two numbers of rolls does Chebyshev’s Theorem guarantee that we will find
at least 75% of the contestants?

Round your
answers to the nearest tenth.

QuestionA
random sample of SAT scores has a sample mean of x¯=1060 and sample standard
deviation of s=195. Use the Empirical
Rule to estimate the approximate percentage of SAT scores that are less than
865.

Round your
answer to the nearest whole number (percent).

QuestionToyotas
manufactured in the 1990s have a mean lifetime of 22.6 years, with a standard
deviation of 3.1 years. The distribution of their lifetimes is not assumed to
be symmetric.

Between
what two lifetimes does Chebyshev’s Theorem guarantee that we will find at
least 95% of the Toyotas?

Round your
answers to the nearest hundredth.

QuestionA
random sample of small business stock prices has a sample mean of x¯=$54.82 and
sample standard deviation of s=$8.95.
Use the Empirical Rule to estimate the percentage of small business
stock prices that are more than $81.67.

Round your
answer to the nearest hundredth.

QuestionPatients
coming to a medical clinic have a mean weight of 207.6 pounds, with a standard
deviation of 22.6 pounds. The distribution of weights is not assumed to be
symmetric.

Between
what two weights does Chebyshev’s Theorem guarantees that we will find at least
95% of the patients?

Round your
answers to the nearest tenth.

QuestionSuppose
that the distribution of snake lengths in a certain park is not assumed to be
symmetric.

According
to Chebyshev’s Theorem, at least what percentage of snake lengths are within
k=2.9standard deviations of the mean?

Round your
answer to the nearest whole number (percent).

QuestionA
random sample of hybrid vehicle fuel consumptions has a sample mean of x¯=53.2
mpg and sample standard deviation of s=4.8 mpg.
Use the Empirical Rule to estimate the percentage of hybrid vehicle fuel
consumptions that are less than 43.6 mpg.

Round your
answer to the nearest tenth.

QuestionA
random sample of lobster tail lengths has a sample mean of x¯=4.7 inches and
sample standard deviation of s=0.4 inches.
Use the Empirical Rule to determine the approximate percentage of
lobster tail lengths that lie between 4.3 and 5.1 inches.


Unified Sampling Theory

Example 2.7.2

Consider the population U = (1,2,3,4) of 4 units from which an ordered sample s = (1,2,2) is selected. Let the y-values of the units selected in the sample s be y1 = 50 and y2 = 100. In this case, d = <(1,50),(2,100),(2,100)> Ωy = (−∞ < y1 < ∞,−∞ < y2 < ∞,−∞ < y3 < ∞,−∞<y4<∞) = R 4 , d ˜ = < ( 1,50 ) , ( 2,100 ) >, Ω y d = Ω y d ˜ = ( 50, 100, − ∞ < y 3 < ∞ , − ∞ < y 4 < ∞ ) . Here both d and d ˜ are consistent with parameter y = (50,100,500,600) but inconsistent with y = (100,100,500,600). Data (D), a random variable, depends on the selection of the sample and realization of the parametric vector y. Given data D = d, the likelihood function of the parameter y was obtained by Godambe (1966) as

where, Id(y) is an indicator variable defined as

The likelihood function (2.7.2) is flat (constant), equal to p(s) for y ∈ Ωyd, and zero outside Ωyd. Hence no unique maximum likelihood of y exists, and the likelihood function is noninformative.


Contents

The prediction interval for any standard score z corresponds numerically to (1−(1− Φ μ,σ 2 (z))·2).

For example, Φ(2) ≈ 0.9772 , or Pr(Xμ + 2σ) ≈ 0.9772 , corresponding to a prediction interval of (1 − (1 − 0.97725)·2) = 0.9545 = 95.45%. This is not a symmetrical interval – this is merely the probability that an observation is less than μ + 2σ . To compute the probability that an observation is within two standard deviations of the mean (small differences due to rounding):

Pr ( μ − 2 σ ≤ X ≤ μ + 2 σ ) = Φ ( 2 ) − Φ ( − 2 ) ≈ 0.9772 − ( 1 − 0.9772 ) ≈ 0.9545

The "68–95–99.7 rule" is often used to quickly get a rough probability estimate of something, given its standard deviation, if the population is assumed to be normal. It is also used as a simple test for outliers if the population is assumed normal, and as a normality test if the population is potentially not normal.

To pass from a sample to a number of standard deviations, one first computes the deviation, either the error or residual depending on whether one knows the population mean or only estimates it. The next step is standardizing (dividing by the population standard deviation), if the population parameters are known, or studentizing (dividing by an estimate of the standard deviation), if the parameters are unknown and only estimated.

To use as a test for outliers or a normality test, one computes the size of deviations in terms of standard deviations, and compares this to expected frequency. Given a sample set, one can compute the studentized residuals and compare these to the expected frequency: points that fall more than 3 standard deviations from the norm are likely outliers (unless the sample size is significantly large, by which point one expects a sample this extreme), and if there are many points more than 3 standard deviations from the norm, one likely has reason to question the assumed normality of the distribution. This holds ever more strongly for moves of 4 or more standard deviations.

One can compute more precisely, approximating the number of extreme moves of a given magnitude or greater by a Poisson distribution, but simply, if one has multiple 4 standard deviation moves in a sample of size 1,000, one has strong reason to consider these outliers or question the assumed normality of the distribution.

For example, a 6σ event corresponds to a chance of about two parts per billion. For illustration, if events are taken to occur daily, this would correspond to an event expected every 1.4 million years. This gives a simple normality test: if one witnesses a 6σ in daily data and significantly fewer than 1 million years have passed, then a normal distribution most likely does not provide a good model for the magnitude or frequency of large deviations in this respect.

In The Black Swan, Nassim Nicholas Taleb gives the example of risk models according to which the Black Monday crash would correspond to a 36-σ event: the occurrence of such an event should instantly suggest that the model is flawed, i.e. that the process under consideration is not satisfactorily modeled by a normal distribution. Refined models should then be considered, e.g. by the introduction of stochastic volatility. In such discussions it is important to be aware of the problem of the gambler's fallacy, which states that a single observation of a rare event does not contradict that the event is in fact rare [ citation needed ] . It is the observation of a plurality of purportedly rare events that increasingly undermines the hypothesis that they are rare, i.e. the validity of the assumed model. A proper modelling of this process of gradual loss of confidence in a hypothesis would involve the designation of prior probability not just to the hypothesis itself but to all possible alternative hypotheses. For this reason, statistical hypothesis testing works not so much by confirming a hypothesis considered to be likely, but by refuting hypotheses considered unlikely.

Because of the exponential tails of the normal distribution, odds of higher deviations decrease very quickly. From the rules for normally distributed data for a daily event:


Suppose a teacher has collected all the final exam scores for all statistics classes she has ever taught. This dataset is normally distributed with a mean of 81 and a std dev of 3.5.

Using this information, estimate the percentage of students who will get the following scores using the Empirical Rule (also called the 95 – 68 – 34 Rule and the 50 – 34 – 14 Rule):

a) Probability that a score is above 81?

In this example, the mean of the dataset (the average score) is 81. Therefore, 50% of students are expected to score above this value and 50% below. The answer here is 50%

b) Probability that a score is below 81?

In this example, the mean of the dataset (the average score) is 81. Therefore, 50% of students are expected to score above this value and 50% below. The answer here is 50%

c) Probability that a score is between 81 (the mean) and 84.5?

Here, 81 is the mean, so we know that 50% of the class is below this point. Next, the score of 84.5 is a one standard deviation above the mean. Why? Because each deviation in this question is “3.5” points. So, a score of 84.5 is 81 + 3.5 or one deviation above the mean.

Using the Empirical Rule, we can see that about 34% of scores are BETWEEN the mean and the first deviation. So there is 34% chance that a student will score between 81 and 84.5.

d) Probability that a score is between 81 (the mean) and 74?

Here, 81 is the mean, so we know that 50% of the class is below this point. Next, the score of 74 is a two standard deviations BELOW the mean. Why? Because each deviation in this question is “3.5” points. So, a score of 74 is 81 – 3.5 – 3.5 = 74 or TWO deviations below the mean.

Using the Empirical Rule, we can see that about 34% + 14% of scores are BETWEEN the mean and the second deviation below it. So there is a 34% + 14% = 48% chance that a student will score between 81 and 74.

e) Probability that a score is between 74 and 88?

Here, 74 is two deviation below the mean and 88 is two deviations above the mean. Using the Empirical Rule, we can see that about 14% + 34% + 34% + 14% of scores are BETWEEN 74 and 88 and to there is a 95% chance that a score will be between 74 and 88.

f) Probability that a score is above 88?

Here, 88 is two deviations above the mean. To score ABOVE 88 there is only a 2.5% chance.

NOTICE: These examples use the Empirical Rule to Estimate the Probability. However, the z value (also called z score) and z table can be used to get the exact probability for any score.


Watch the video: normalfordeling og empiriske (July 2022).


Comments:

  1. Varney

    Interesting moment

  2. Voodoogal

    you were visited simply brilliant idea

  3. Tazahn

    Simply the Shine



Write a message