Skill level: Basic/advanced
A common graphical tool used to portray and visualize the distribution of a set of data is the histogram. It shows the form of the distribution by establishing the frequency of the data within a certain range.
The histogram is constructed by taking the difference between the minimum and maximum observations and dividing it into evenly spaced intervals. Then, the number of observations in each interval is counted and the frequency is plotted as the height of a bar on the graph.
The histogram is, in essence, a simplified view of the distribution that generated the plotted data.
- Simple to use, visualize, and interpret
- Applicable to all variable data
- The type of distribution does not matter
- Can be computed by hand, but many applications can generate histograms
How to Use
- Step 1. Collect variable data (length, distance, weight, time, etc).
- Step 2. Generate the graph as a histogram, either with an application or manually.
- Step 3. Draw your conclusions.
Variable data: Any number that can exist in any fractional value (such as 12.22 inches in length).
A manager at a luxury hotel has received many complaints regarding the temperature in one of the large meeting rooms that corporate customers use on a regular basis. Many group leaders have complained about the room being too cold in the morning and too hot in the afternoon.
Customers have said they are comfortable if the room stays between 67 and 72 degrees Fahrenheit. The manager asks the maintenance supervisor to start recording the temperature in the room every 15 minutes, starting when a group enters the room and until it leaves in the afternoon.
Data are collected for several days and the onsite mechanical engineer who supervises the data collection finds that the average temperature of the room is 70.1 degrees Fahrenheit, indicating that the average temperature is ideal. His next task is to determine if there are times during the day when the temperature is over or under the value that customers expect.
The histogram shows that a fairly large portion of the data is either close to, or exceeds, customer expectation. On average, the system is doing fine. The problem is that there is too much variation in the temperature, with the range at 62 degrees as the lowest point, and 77 degrees as the highest point.