Gage/Gauge Repeatability and Reproducibility (Gage R&R)

Skill level: Intermediate/advanced


Gage (or gauge) repeatability and reproducibility (gage R&R) helps ensure that measurements are consistent and reliable, no matter who performs the measuring, monitoring, scoring, grading, or counting activity. The methods of scoring the tests are beyond the scope of this description. (Please refer to Kappa study methods and statistical software help files and training materials for additional information regarding generating the scores.)


  • Ensures that data are collected and assessed consistently and accurately
  • Provides confidence that decisions can be based on data
  • Removes personal and subjective bias from the measuring system
  • Helps with understanding and quantifying variation in the measuring system

How to Use

  • Step 1.  Select a sample of at least 20 items for testing. Ensure the sample includes items of both pass/fail for accuracy evaluation or simple/difficult for count validation.
  • Step 2.  Have an expert or team of experts, but not front-line assessors, evaluate these items according to documented standards. This evaluation is the “standard.”
  • Step 3.  Have the assessors evaluate the sample without knowledge of how the experts rated the sample.
  • Step 4.  Re-order the sample and have the assessors re-evaluate the sample and record their findings.
  • Step 5.  Evaluate the results of the first pass for each assessor compared to the results of the second pass to determine individual repeatability. Scores in the range of 90 to 95 percent are typical in the service industry. If an assessor’s results are not within acceptable limits, the assessor should be retrained on the evaluation process.
  • Step 6.  Score the assessors’ results against each other and compute the percentage of the time they are in agreement. This is the overall repeatability of the scoring process. If there is significant deviation in the results, an analysis will need to be undertaken to determine the root cause of the variation. Typical root causes are poor standards documentation, insufficient training, or insufficient feedback on performance.
  • Step 7.  Compute how well the appraiser did against the standard to determine the reproducibility. A poor score is an indication that a root cause analysis must be conducted and appropriate corrective action taken, as indicated in step 6.

Relevant Definitions

Not Applicable


A payment processing company was receiving numerous complaints about errors made during the application of payments to customer accounts. The results of “quality audits” for the previous 60 days showed that payment processing was 99.4 percent accurate.

A gage R&R study was undertaken to determine if the assessments had any abnormalities in the evaluation process. The expert team took a sample of 50 payments and rated the batch. The accuracy score for the payment sample was 91.7 percent – the standard.

Three assessors evaluated the same payments and then re-evaluated the re-ordered sample. Their individual repeatability scores averaged 99.8 percent, indicating that each assessor was following the same process in assessing the payments. However, the average overall accuracy score for the payment batches was 98.9 percent, or 7.2 percent higher than the standard.

The expert team completed a root cause analysis and identified the problem. The software was upgraded 45 days earlier and the 16-digit system was expanded to accommodate an optional 17th digit. This 17th data field was only required to be keyed under certain circumstances. The auditors were only looking at the first 16 digits in their assessment and didn’t check the 17th digit, which accounted for the difference in the scoring between the auditors and the standard. The auditing procedures and training were updated for the 17th digit.

« Back to Glossary Index