Quality Handbook
Ensuring a clean data file from which as many errors as possible have been removed prior to starting the statistical analysis.
The data cleaning process is aimed at obtaining a data file which is as clean possible. Data cleaning involves monitoring the following (in this order):
Work in the right order: Firstly, deal with the “out-of-range” issues, and only then carry out inconsistency assessments, as the risk of finding inconsistencies is smaller when the “out-of-range" improvements have been made.
Data improvement
When tracing errors, go back to the source. This may, for instance, be the relevant questionnaire in order to assess where the problem lies (data entry error, interpretation error, wrong entry by respondent or issues which cannot be resolved any further). In the event of an interview (both open as well as closed) it is possible to return to the tape recording or report of the contact form associated with the interview, or the report (form) created by the interviewer or respondent.
Improvements need to subsequently be included in a copy of the raw SPSS files at a variable/form level. These should then be stored under a new name. The raw SPSS files refer to files where the data entry checking has already taken place (see Data Entry Accuracy), but where no variables or questionnaires have been added together to form a single file (also refer to the schematic overview of the various stages of the files in the data processing phase); no new variables have been created in this file either as of yet.
In the event of an incorrect response by the respondent, the associated variable should be coded as “user missing”. It is of the utmost importance to clean every variable in a file, and not just those variables to be used in the statistical analysis. Once the improvements have been made, the files should be stored under a new name. These are referred to as cleaned SPSS system files.
It is important that the modifications carried out during the data cleaning process are documented in a logbook.
Introductory Meeting Data Management part 2.
Introductory Course SPSS for the Post-Initial Master’s programme in Epidemiology
Information and guides for the data processing phase can be found on the Data and System Management’s intranet pages.
V1.2: 1 Jan 2010: English translation,
V1.1: 29 Nov 2006: Small textual amendments.