Box Plot

Skill level: Intermediate

Description

The box plot is an analytical tool that graphically illustrates the shape of a data set. It helps in understanding the distribution of a data set in a very simple form. It also provides information when comparing several sets of data by displaying them side-by-side on an identical scale.

The graphical representation contains three major elements or sections:

  • Upper and lower whiskers
  • Interquartile range box, which represents 50 percent of the data
  • Outlier: data out of range

Benefits

  • Simple visual illustration of the distribution of the data
  • Standard format across all statistical software
  • Can be generated by simple and inexpensive statistical applications

How to Use

  • Step 1.  Collect data (the output data must be variable data).
  • Step 2.  Analyze the data using statistical software to generate the graphical representation of the box plot. This can be done manually, but it is time consuming.
  • Step 3.  Analyze the results and draw your conclusion. This is not a hypothesis test, but a visual illustration of the data set and its distribution.

Relevant Definitions

Median: Represents the value in the data where you find 50 percent of the data below and above that point.

Outlier: Data or value that is outside of the normal range of the current distribution, usually represented by an asterisk (*) in the graph.

Interquartile: Data divided into four parts, with each part representing 25 percent of all the values.

Variable data: Data that exist and can be divided or measured in fractions of units (monetary amounts, miles, distance, weight, etc.).

Example

A consulting firm wants to know if all the requests for quotations (RFQs) it received and won compare to the actual cost of completing the contracts, and to show any significant difference. The firm seeks to compare the data from the quotes and the data from the actual accounting reports (final results).

The firm retrieves the data from the last three years and uses the box plot graphical analysis as its first look at the distribution of these two sets of data (before and after). That analysis will help determine if it is necessary to dive deeper into the data or not. If the box plot does not show noticeable differences between the quoted values of the contracts and the final costs, the firm’s estimating methods are fairly accurate and do not warrant any changes, assuming that it is fairly successful in winning the contracts.

If the box plot clearly indicates the final results are very different than the quotes, the firm will want to revise its methods, either placing more competitive bids or attempting to make more money, depending on which direction the difference heads.

Below is a table showing the data from 45 contracts executed in the last 36 months.

Box_Plot_Table

The distribution of the data, as shown by the box plot graphic, indicates that the RFQ and the final results are very similar, the median being about the same value and the boxes about the same length.

This is a good indication that the methods for evaluating the different contracts are fairly precise.

It also shows that there is no outlier in the data.

 

« Back to Glossary Index