Data analysis documentation

Aim

To ensure that the analyses can be properly reproduced

 

Requirements

  • Clear documentation of the data analysis in a log file (for example SPSS syntax, Do file in STATA, R script or Word file), to be able to reproduce the relevant data analyses.

 

Documentation

Log file including:
  • Specific research questions or purpose of the analysis;
  • Databases which are used for the analyses (For example ‘get file’ statement in SPSS syntax);
  • All statistical analyses which are executed.

 

Responsibilities

Executing researcher: To document all steps that are taken throughout the data analysis in a log file.
Project leaders: To regularly check and discuss the data analysis, by using the documentation in a log file.
Research assistant: N.a.

 

How To

It is important in respect of reproducibility and efficiency of data analysis that clear documentation of the data analysis takes place. This may be undertaken by creating a log file for all the relevant analyses. This file needs to start off with the research question to be answered and the date of the analysis, and should end with a(n) (provisional) answer to the question.
A lof file (e.g. SPSS syntax) can be used to document your analyses (e.g. for an article) to allow you and others to easily retrieve and reproduce everything. Don’t forget to always include the name and location of the datafile (e.g. ‘get file’ in SPSS), so you know which file is related to your analysis (and where they are stored). Log files should include the code for all statistical tests conducted, to serve as an analysis logbook. Place your code in a logical order (e.g. firstly all the analyses for table 1, then table 2, etc.). A Dutch example of this can be found here.
Tip: annotate your log files (e.g. by using * followed by text in SPSS syntax). Annotations are an important part of documentation of your data analyses and facilitate reproduction of your results end recycling of your code.

 

Audit questions

  1. Does the data documentation for specific analyses contain the following aspects:
    1. Specific research questions or purposes of the analysis;
    2. Names of databases which have been used for the analysis (e.g. ‘get file’);
    3. Specific code for statistical tests.

 

V3.0: 26 Oct 2016: Revision guideline
V2.0: 12 May 2015: Revision format
V1.1: 1 Jan 2010: English translation
V1.0: 21 Apr 2004: Title modified: Documentation instead of Report. Adding details with example of documented syntax