Sample Module

Statistics I

Introduction

As was noted in the prerequisite module, analysis is the process of organizing, rearranging, sorting, and summarizing raw data in a manner that yields insights we call information. That information improves decision making. The part of statistical analysis that most affects decision making is hypothesis testing. Hypothesis testing is the comparison of the analyst's belief or claim about a population parameter to the corresponding sample statistic and deciding whether or not the belief or claim about the population parameter is correct. A common example is the filling of soft drink bottles in a bottling plant. Managers have the belief that the machine is filling 16-ounce bottles with 16 ounces of the soft drink. This belief is tested by drawing a sample of filled bottles, measuring the amount of soft drink in each, finding the mean and variance of the amount of soft drink in sample of bottles, comparing this amount with the 16 ounce belief, and finally deciding whether the information from the sample supports or refutes the 16 ounce belief. This process is also known as the drawing of a sample from the population in order to make inferences about the population. Hpothesis testing is also known as inferencial statistics.

Hypothesis testing improves decision making by providing evidentiary support for a decision. Managers do not need to guess about the performance of the bottle filling machine. The sample and the test are evidence that the machine is filling the bottles correctly or incorrectly. Though the quality of the evidence may sometimes be less than ideal, evidence based decision making is the ideal to which managers should strive.

The Hypothesis Statement

In the experience of the author of this module and in the experiences of the authors of introductory statistics textbooks, the hypothesis statement and its formulation is the area in which students have the least intuitive understanding, find the most confusing, and have the greatest difficulty grasping of all areas in undergraduate statistics. We try to make this topic clearer for you.

A complete hypothesis statement has two parts; the null hypothesis and the research or alternative hypothesis. The two parts are complements (from Introduction to Probability) in that together they exhaust all possible outcomes for the population parameter.

Null Hypothesis

The null hypothesis is the claim or belief that is the status quo. It is that which is believed to be, presumed to be, or accepted as true and correct without any evidentiary support. In the bottling scenario above, managers believe the machine is filling the bottles correctly even though they have no evidence to support that belief. The null hypothesis in that case is that the bottling machine is functioning correctly. In the judicial systems of the United States and many other nations, an accused is presumed to be innocent of the charges without any evidence that innocence is in fact true. The null hypothesis in a court case in these jurisdictions is that an accused is not guilty.

Alternative Hypothesis

The research or alternative hypothesis is the claim or belief that is to be proven. It is that which is not believed to be, not presumed to be, nor accepted as true and correct until there is evidence to support its truth and correctness. In the bottling scenario above, managers will not believe the machine is filling the bottles incorrectly unless and until they have evidence to support that belief. It is the sample which provides this evidence. The alternative hypothesis in that case is that the bottling machine is not functioning correctly. In the judicial systems scenario, an accused is only found guilty of the charges only when there is sufficient evidence to support the claim that guilt is in fact true. It is the trial which provides this evidence. The alternative hypothesis in a court case in these jurisdictions is that an accused is guilty.

Choosing between the Null Hypothesis and the Alternative Hypothesis

This is where most students start having difficulty. That difficulty is understandable because it is not always clear what should be the null hypothesis and what should be the alternative hypothesis. Most of the time, it is clear. The bottling scenario is a clear example, as is the court system scenario. However, consider the scenario in which a consumer group is complaining that the on-time rate of an airline is too low whereas the airline claims that its on-time rate is fine. What is the null hypothesis?

As stated above, the alternative hypothesis needs to be proven. In general, the null hypothesis will not require any action consequent from the decision on the part of the manager. If the bottling machine is functioning correctly, there is no need for any intevention by the manager. If an accused is not guilty, the case is over and the accused is free to go. In general, the alternative hypothesis will requires some action and the sample evidence is justification for that action. If the bottling machine is not filling the bottles correctly, it needs to be adjusted, the badly filled bottles need to be recalled. The manager needs justification for these actions and the evidence from the sample provides it. If an accused is found guilty, then some action regarding punishment is needed. The evidence from the trial provides the justification for this action.

In the on-time scenario, what is the status quo? What is accepted as true without evidence? What is to be proven? What action will be taken? None of this is clear because the answers depend on the perspective of the statistical analyst. If the analyst works for the consumer group, then that is the perspective of the hypothesis statement. In that case, the status quo, that which is accepted as true without evidence, is that the on-time rate is adequate, above some minimum threshold. The consumer group needs evidence to support its claim that the on-time rate is too low. That claim is the alternative hypothesis because it needs to be proven. So for the consumer group, the null hypothesis is that the on-time rate is high enough while the alternative hupothesis is that the on-time rate is too low. The airline has the opposite perspective from the consumer group. If the analyst works for the airline, then that perspective results in the alternative hypothesis being that the on-time rate is high enough. That is the airline's claim which it needs to prove. The null hypothesis, that which the airline is accepting as true without evidence, is that the on-time rate is too low. The alternative hypothesis, that which the airline needs to prove, is that the on-time rate is high enough.

The null hypothesis has the symbol H0. The alternative hypothesis has the symbol H1 or HA. The null and alternative hypotheses together make up the hypothesis statement.

Example 1.

A cheese manufacturer ships it product in 8 ounce packages. The quality control manager wants to verify that the machine is producing packages that, in fact, weigh 8 ounces. What is the hypothesis statement the manager should use?

The null hypothesis is that which is accepted as true without evidence. This is that the machine is working correctly. The null hypothesis is that the machine is producing cheese packages that weigh 8 ounces. The alternative hypothesis is that which is accepted as true only if there is evidence to support its truthfulness. This is that the machine is not working correctly. The alternative hypothesis is that the machine is producing cheese packages that do not weigh 8 ounces. This will be evidence that the machine needs to be re-calibrated.

The entire hypothesis statement is:
H0: The machine is producing cheese packages that weigh 8 ounces.
HA: The machine is producing cheese packages that do not weigh 8 ounces.

Example 2.

A weight loss company claims its product, when used as directed, will lead to at least 10 kilograms of weight loss within 45 days. A consumer protection agency wants to verify the company's claim in order to protect the public from fraud. What is the hypothesis statement the agency director should use?

The null hypothesis is that which is accepted as true without evidence. This is that the claim is ture. The null hypothesis is that the product does lead to at least 10 kilograms of weight loss within 45 days. The alternative hypothesis is that which is accepted as true only if there is evidence to support its truthfulness. This is that the product does not perform as claimed, does not lead to 10 kilograms of weight loss within 45 days. The alternative hypothesis is that the product leads to less than 10 kilograms of weight loss within 45 days. This will be evidence the agency needs to support some punishment of the company.

The entire hypothesis statement is:
H0: The product leads to at least 10 kilograms of weight loss within 45 days.
HA: The product leads to less than 10 kilograms of weight loss within 45 days.

Example 3.

A government agency has the responsibility of enforcing anti-pollution laws. It monitors the emissions of factories to ensure that a pollutant has an emission rate of less than 10 ppm per week. What is the hypothesis statement the agency should use?

The null hypothesis is that which is accepted as true without evidence. This is the state that requires no action by the agency is true. The null hypothesis is that the factory is emitting less than 10 ppm per week of the pollutant. The alternative hypothesis is that which is accepted as true only if there is evidence to support its truthfulness. This is the state that requires the agency to take some type of action. The alternative hypothesis is that the factory is not emitting less than 10 ppm per week of the pollutant. This will be evidence that the agency needs to support some punishment of the company.

The entire hypothesis statement is:
H0: The factory is emitting less than 10 ppm per week of the pollutant.
HA: The factory is not emitting less than 10 ppm per week of the pollutant.

More of the Hypothesis Statement

Though we introduced hypotheses statements written in prose, they are actually written in mathematical symbols. We use the symbol for the population parameter being tested rather than the name of the population parameter. We use the numerical value of any limit, threshold, or amount rather than the words which describe that limit, threshold, or amount. We use symbols rather than words to express whether the population parameter is presumed to exactly equal, be greater than, or be less than some value. This leads to three type of hypotheses statements: two-tailed, one-tailed greater than, and one-tailed less than.

Null Hypothesis

The null hypothesis, regardless of the type, always contains a statement of equality. The population parameter equals a value, is greater than or equal to a value, or is less than or equal to a value. If the parameter involved is the mean, and the value is given as k, then the null hypothesis of a two-tailed statement is written in symbols as H0: μ = k. The null hypothesis of a one-tailed greater than statement is written in symbols as H0: μ ≤ k. The 'greater than' part refers to the altenative hypothesis. The null hypothesis of a one-tailed less than statement is written in symbols as H0: μ ≥ k. Again, the 'less than' part refers to the altenative hypothesis.

Alternative Hypothesis

The alternative hypothesis, regardless of the type, always contains a statement of inequality. The population parameter does not equals a value, is greater than a value, or is less than a value. Again using the mean, the alternative hypothesis of a two-tailed statement is written in symbols as HA: μ ≠ k. The alternative hypothesis of a one-tailed greater than statement is written in symbols as HA: μ > k. The alternative hypothesis of a one-tailed less than statement is written in symbols as HA: μ < k.

Choosing between Two-tailed and One-tailed Hypotheses

This is another area where most students have difficulty. The difficulty is in determining whether the analyst should be concerned with whether the population parameter is greater than some value, less than some value, or exactly equal to some value. It is the context of the situation that guides whether a two-tailed or one-tailed hypothesisis appropriate. Unfortunately, many textbooks don't give enough context to their problems to guide students, or students don't have enough general knowledge to provide missing context to the situation. For business situations, the context is almost always guided by cost reduction, revenue growth, or regulation compliance.

An example used above was that of a machine producing products of a specific weight or volume. Say a machine is calibrated to fill cans with 12 ounces of soft drink. Should the producer of canned soft drinks be concerned about filling cans with more than 12 ounces of beverage? Yes! The product is being priced and sold as if the cans contain 12 ounces of beverage. If the machine is miscalibrated and puts 12.4 ounces of beverage in each can, then the cost of the beverage in each can and its price will no longer 'match' and the company's profits will decline. So the manager will certainly want evidence that any machine is over-filling the cans. But what about filling the cans with less than 12 ounces of beverage? In that case, there is no lost profit becuse the mismatch between cost and price now favors the company. However, when customers buy a can that is supposed to contain 12 ounces of beverage and it contain less, those customers become angry. They will express their anger through making complaints, returning the can, but most frequently, ceasing their purchases of that company's products. The loss of profit associated with a sales decline is something the manager would like to avoid. So, in almost all of the scenarios involving weights, dimensions, volumes, etc., the analyst is concerned with the population parameter being above or below the threshold. That corresponds to a two-tailed hypothesis.

Another example used above was that of a factory emitting excessive amounts of a pollutant. In these regulation compliance scenarios, the manager is very concerned about pollutant levels exceeding the regulatory limit. But should the manager be concerned about pollutant emissions below the regulatory limit? Reducing pollutant emissions more than is required will cost more than necessary and reduce profits. So, yes? If the analysis is primarily focused on evidence of regulatory non-compliance and only secondarily on profit maximization, then maybe the manager should not be concerned about the cost of reducing pollutant emissions beyond the compliance threshold. There is room for debate on this issue. We are of the opinion that if the test is about regulatory compliance, then profit consequences should be left out. So, in almost all of the scenarios like regulation compliance, the analyst is concerned about exceeding the regulatory limit. That corresponds to a one-tailed hypothesis.

Whether a one-tailed hypothesis is 'greater than' or 'less than' depends on the situation. Usually, the context will be clear. However, perspective plays an important role. A regulatory agency has the perspective of finding evidence on non-compliance. The regulated entity has the perspective of providing evidence of compliance. One of these two perspectives will require the 'less than' hypothesis while the other will require the 'greater than' hypothesis.

The complete hypothesis statement in symbols will look as follows:
H0: μ = k,   HA: μ ≠ k.     H0: μ ≥ k,   HA: μ < k.     H0: μ ≤ k,   HA: μ > k.

Example 1.

A cheese manufacturer ships it product in 8 ounce packages. The quality control manager wants to verify that the machine is producing packages that, in fact, weigh 8 ounces. Write the best hypothesis statement.

This is a two-tailed hypothesis because the manager wants evidence of the machine producing packages that weigh more than 8 ounces or weigh less than 8 ounces.

The entire hypothesis statement is:
H0: μ = 8 ounces.
HA: μ ≠ 8 ounces.

Example 2.

A weight loss company claims its product, when used as directed, will lead to at least 10 kilograms of weight loss within 45 days. A consumer protection agency wants to verify the company's claim in order to protect the public from fraud. What is the hypothesis statement the agency director should use?

This is a one-tailed less than hypothesis because the consumer protection agency wants evidence that the weight loss product does not performs as claimed, yields less than 10 kg of weight loss.

The entire hypothesis statement is:
H0: μ ≥ 10 kg.
HA: μ < 10 kg.

Example 3.

A government agency has the responsibility of enforcing anti-pollution laws. It monitors the emissions of factories to ensure that a pollutant has an emission rate of less than 10 ppm per week. What is the hypothesis statement the agency should use?

This is a one-tailed greater than hypothesis because the government agency wants evidence that the factory emissions are too high, more than 10 ppm per week.

The entire hypothesis statement is:
H0: μ ≤ 10 kg.
HA: μ > 10 kg.