Quality Handbook
Determining the accuracy of the data entry.
Prior to starting the data cleaning process the researcher needs to decide whether it is necessary to carry out an extensive (full double data entry) or less extensive (sampling) evaluation for data entry errors (typing errors) and interpretation errors by the data entry clerk. The necessity of full or partial double data entry is determined by issues such as:
Double data entry of all data is the ideal situation and the most reliable. However, this is usually not a feasible option, given the time and staffing required. One should then set off with monitoring the data in samples, starting with the most complex questionnaires. If the decision is made to focus on part of the database, then a note should be made in the logbook of the number of records re-entered and the percentage of inconsequential errors found.
To evaluate the entry errors by sampling, the project leader should draw a small sample from the respondents (approx. 5%), and have the questionnaires or registration forms re-entered into an empty database. This second input should be carried out in principle by someone other than the individual who completed the first data entry. For instance, if the first round of data entry is done by a project assistant the second can be carried out by the researcher. The reliability of the entered data can be assessed by comparing the first and second round of data entry (using a special software programme, see details).
If the amount of errors discovered is greater than 3% per questionnaire (in comparison to the total number of variables inputted), then the questionnaire needs to be double-entered in its entirety! Subsequently, the first and second input should be compared in the same way. If the second input is carried out by the same person, then the permissible margin of error is smaller than 3%, i.e. 1.5%.
This procedure applies both to manual input in for instance Blaise, as well as to scanned questionnaires. For scanned questionnaires the forms from the sample are re-scanned into a separate file.
The project leader will ensure this procedure is carried out (on time).
The second input will always be undertaken with a(n) (empty) copy of the original input system. This could, for instance, be the input screens in Blaise. However, this could also involve optically readable forms where only a sample is re-scanned.
The original file can be linked to the file with the double entry using the SPSS Data Entry programme (with the potential assistance of a member of the D&S department). In this process, both the original data as well as the data entered twice need to be converted from the original input programme into an SPSS file. Any differences at the level of respondents are written to a report file. See the intranet of D&S
The Introductory Meeting Data Management part 2 explains the “Monitoring of data input” extensively.
The Guide to Monitoring Data Input is available on the Data and System Management department’s intranet pages.
V1.1: 1 Jan 2010: English translation.
V1.0: 31 Mar 2004.