Data Mining

Skill level: Intermediate-advanced

Description

Data mining is a technique used to make sense of collected data. Several statistical tools can be used, including Pareto charts and ANOG (analysis of good). With the capability of simple computer applications such as spreadsheets, advanced data mining using pivot tables and charts is possible.

The objectives are to find threads in the data or common factors that will explain the process behavior. This information can be used to make predictions about consumer purchasing and marketplace trends, or why there are long wait times at service counters. As the name “mining” suggests, this technique helps uncover “hidden” information in data sets.

Benefits

  • Easily uncovers hidden data and information
  • Can be performed with standard applications, such as spreadsheets
  • Does not require you to be a statistician

How to Use

  • Step 1.  Collect the data of interest.
  • Step 2.  Format the data for the application you will be using (preferably in columns).
  • Step 3.  Use the tool that is appropriate for your needs and slice the data.
  • Step 4.  Analyze the results and draw your conclusion.

Relevant Definitions

Pareto chart: A visual representation of data using a bar chart.

ANOG (analysis of good): A visual representation of the data and correlation (if there is one) between factors.

Example

A retail store tracks the purchases of a customer and notices that the customer buys a lot of silk shirts. The data mining system makes a correlation between that customer and silk shirts. The sales department reviews the information and begins to market silk shirts to that customer. In this case, the data mining system used by the retail store discovered new information that was previously unknown to the company. By tracking sales with key factors such as ZIP code, buyer’s information (if possible), and date and time, the retail store can improve its merchandizing and marketing approach.

Retail sales data mining using ANOG method (the simplest one):

  • Step 1.  Organize your data in columns (as shown below).
  • Step 2.  Code the data using “conditional formatting” (in this case, each type of shirt has a different color).
  • Step 3.  Sort the data by color.

Data_Mining_Table

Conclusions:
All customers in the 39963 zip code prefer silk shirts, except one who chose cotton;residents in the 40061 area definitely prefer jersey material over silk and cotton. With this information at hand, it is possible to conduct targeted marketing.

 

« Back to Glossary Index