Confidence interval

Skill level: Intermediate

Description

Confidence intervals are a way to communicate a range of values that contains the outcome with a degree of certainty given by the confidence level. The confidence level determines the degree of certainty, which will determine the risk of an incorrect conclusion.

All values in the confidence interval have equal probability of being the true, sought-after value of the population. A higher confidence level will tend to widen the confidence level given a fixed sample size. The confidence interval will shrink as the sample size increases, all else being equal.

Typically, an expression would be: Estimate ± Margin of error

The confidence level would then be: Estimate – Margin of error to Estimate + Margin of error

It is critical to adjust the confidence level based on a logical understanding of the problem and the degree of accuracy required. In addition, there is a fundamental tradeoff between sample size and cost or time required for the study. This will determine the size of the confidence interval at a given confidence level.

Confidence intervals rely on true random sampling or they will not be reliable.

Consider an example: A study of correctly completed insurance applications was conducted and it was determined the mean is 77 percent with a confidence level of 95 percent, and the confidence interval is 70 percent to 84 percent. Therefore, if the data are sampled 1,000 times, in 95 percent of those samples, the mean will fall between 70 and 84 percent.

Benefits

  • Clearly defines the range of probable values
  • Can be visually communicated with error bars around a value
  • Can be adjusted based on the required degree of accuracy
  • Shows the degree of uncertainty

How to Use

  • Step 1.  Assuming a given problem and approach, select the confidence level required.
  • Step 2.  Sample the data.
  • Step 3.  Analyze the data for the desired value(s).
  • Step 4.  Calculate the confidence interval.
    • Most practitioners use statistical software programs for this but the calculation is as follows, for a population with a normal distribution, where z is the standard normal critical value, n is the sample size and s is the sample standard deviation:
      • Confidence_Interval_Standard_Deviation
    • *Note: Sample standard deviation can be used only for large sample sizes due to the central limit theorem, typically > 30; smaller samples should substitute the t distribution critical value for z.
  • Step 5.  Check that the sample size is adequate to yield the required level of precision.
  • Step 6.  Present per this format: Estimate ± Margin of error, or graphically as in the example below.

Relevant Definitions

Confidence level: The desired degree of certainty for the result.

Sample size: The quantity of individual data points collected from the population.

Random sampling: When data are collected from a population with an equal probability of selecting any point in the population.

Critical value: Value of area under the standard normal curve at the selected interval.

Example

Joe’s Trucking provides shipping service from the outlying areas of New York City to a known distribution point within Manhattan. To understand the number of trucks required for a given service level agreement defined by a guaranteed delivery interval, the operations manager conducts a study of drive times. The manager would like to obtain an accuracy of 30 min, plus or minus 15 minutes.

Randomly sampling the drive times over a 10-day period, including weekends and all times of day, produces 50 data points. A 95 percent confidence interval is required to ensure customer satisfaction. The mean is found to be 3.01 hours, with a standard deviation of 0.64

The general sample size equation helps to check that the sample size is adequate for the accuracy requirements:

Confidence_Interval_Sample_Size_Equation
Given this result, the operations manager knows that the mean delivery time with a relatively high degree of certainty (95 percent) is somewhere between 2.83 hours and 3.19 hours. The data can now be used to build a model for the capital required to deliver the agreed-upon service level agreement.

 

 

« Back to Glossary Index