Data entry accuracy

Aim

To explain how to determine the accuracy of data entry.

 

Requirements

  • Prior to data cleaning, researchers should evaluate the data entry error of their data;
  • If the amount of errors discovered by data entry evaluation is larger than 3% per registration form/questionnaire, then the registration form/questionnaire needs to be double-entered entirely.

 

Documentation

  • The percentage of inconsequential errors of data entry;
  • The number of records re-entered.

 

Responsibilities

Executing researcher:
  • To decide whether an extensive (full data entry) or less extensive (sampling) evaluation for data entry errors and interpretation needs to be carried out, prior to start data cleaning process;
  • To determine the percentage of the data entry error;
  • To re-enter (a sample of) registration forms/questionnaires of respondents, if necessary;
  • To document the number of records re-entered and the percentage of inconsequential errors found.
Project leaders:
  • In case of sampling, to draw a small sample from the respondents (approx. 5%) and to have these registration forms/questionnaires re-entered by someone other than the individual who completed the first data entry;
  • To ensure this procedure is carried out (on time).
Research assistant: To re-enter (a sample of) registration forms/questionnaires of respondents, if necessary.

 

How To

To evaluate the entry errors by sampling, the project leader should draw a small sample from the respondents (approx. 5%), and have the questionnaires or registration forms re-entered into an empty database. This second input should be carried out in principle by someone other than the individual who completed the first data entry. For instance, if the first round of data entry is done by a project assistant the second can be carried out by the researcher. The reliability of the entered data can be assessed by comparing the first and second round of data entry (using an option in SPSS, Data–>Compare Datasets).
If the amount of errors discovered is greater than 3% per registration form/questionnaire (in comparison to the total number of variables inputted), then the registration form/questionnaire needs to be double-entered entirely! Subsequently, the first and second input should be compared in the same way. If the second input is carried out by the same person, then the permissible margin of error is smaller than 3%, i.e. 1.5%.This procedure applies to both manual input, for instance by using Blaise and scanned questionnaires. For scanned questionnaires the forms from the sample are re-scanned into a separate file.
This procedure is obviously not applicable when using a webbased questionnaire that will be filled in by research participants themselves.
The necessity of full or partial double data entry is determined by issues such as:
  • Irregularities observed during data collection;
  • The complexity of the registration forms/questionnaires entered (large risk of interpretation errors);
  • The required reliability of the data (double input is standard practice for GCP research, like drug research);
  • Doubt about the reliability and accuracy of the data entry clerk(s);
  • Whether controls have been built in to the data entry programme to detect inconsistencies and out-of-range values.
Double data entry of all data is the ideal situation and the most reliable. However, this is usually not a feasible option, given the time and staffing required. One should then set off with monitoring the data in samples, starting with the most complex registration forms/questionnaires. If the decision is made to focus on part of the database, then a note should be made in the logbook of the number of records re-entered and the percentage of inconsequential errors found.

 

Appendices/references/links

The Introductiebijeenkomst Datamanagement deel 2 explains the “Monitoring of data input” extensively.

 

Audit questions

  1. Has the data entry been assessed?
    1. If not, why not?
    2. If so, how this the assessment been carried out, and what was the result?
  2. For poor results (more than 3% errors): Was the entire registration form/questionnaire re-entered, or were other actions taken?

 

 

V3.0: 2 December 2016: Text updated
V2.0: 16 July 2015: Revision format
V1.2: 31 Oct 2013: Availability of a new comparison programme
V1.1: 1 Jan 2010: English translation
V1.0:  31 Mar 2004