Attribute agreement analysis

Skill level: Advanced


Every time someone makes a decision – such as, “Is this the right candidate?” – it is critical that the decision-maker would select the same choice again and that others would reach the same conclusion. Attribute agreement analysis measures whether or not several people making a judgment or assessment of the same item would have a high level of agreement among themselves.


  • Helps to characterize the quality of the data
  • Determines the area of non-agreement
  • Helps in calibrating appraisers, judges, or assessors for a higher level of agreement
  • Easy to analyze with statistical software or a specialized worksheet

How to Use

  • Step 1.  Set-up a structured study where a number of items will be assessed more than once by more than one assessor. Have the items judged by an expert, which will be referred to as the “standard” (can be one person or a panel – see table below).
  • Step 2.  Conduct the assessment with the assessors in a blind environment. They do not know when they are evaluating the same items and they do not know what the other assessors are doing.
  • Step 3.  Enter the data in a statistical software package or an Excel spreadsheet already set up to analyze this type of data (built-in formula).
  • Step 4.  Analyze the results: Is there good agreement between appraisers? Each appraiser vs. the standard? All appraisers vs. the standard?
  • Step 5.  Draw your conclusions and decide on the course of actions needed if the level of agreement is below a set threshold. Usually > 80 percent is considered to be a good level of agreement.

Relevant Definitions

Standard: A value given to an item by an expert or a panel of experts that is considered the “true value” or “true answer.”

Attribute: Non-numerical data, also known as discrete data. For example, good or bad and pass or fail.


A large corporation is conducting a series of interviews for technical support positions and has received a large number of applications. A team of interviewers will carry out the initial screening interviews by telephone. Their task is to categorize the applicants as “pass” or “fail.”

Before moving forward, the human resources (HR) manager wants to ensure that the team conducting the interviews is capable of a high level of agreement. Otherwise, the team might select unsuitable candidates.

The HR manager creates a study based on previous interviews that have been recorded. The interviewers listen to the interviews and decide whether the candidates should be categorized as “pass” or “fail.”

The study is structured in the following manner:

  1. Twenty recordings are selected. About half of the applicants have been ranked “pass” and the other half have been ranked “fail” based on the assessment of a panel of HR experts. Some are obviously “fail,” but not all of them are clearly “pass” or “fail.” Some are borderline judgments.
  2. The panel of experts provides the “right answer” for each candidate.
  3. Three interviewers are selected to participate in the study. Each interviewer listens to each recording once and assigns a “pass” or “fail” score. Once all three interviewers complete the evaluation of the 20 candidates, they are asked to listen to 20 more recordings. The interviewers are not told that have been presented with the same recordings, just repeated in random order.
  4. The interviewers classify the same recordings a second time as “pass” or “fail.”
  5. After the study is completed, the data are analyzed.

The table below shows the results for each candidate and the three interviewers’ assessments. The numbers “1” and “2” after the interviewers’ names refer to the first and second assessments of the candidate by the same person.


This example has been analyzed using Minitab ® statistical software:


The remaining analysis of these data appears in the session window of Minitab. Below is an extract of this analysis (note: not all of the data analysis is shown):


Note: The details about conducting the analysis and performing the complete analysis are beyond the intent of this document. Please refer to statistical analysis references for more details.


« Back to Glossary Index