Analysis planGuideline in PDF


To promote structured and targeted data analysis


An analysis plan should be created prior to the data analyses. The analysis plan contains a description of the research question and what the various steps in the analysis are going to be. The analysis plan is intended as a starting point for the analysis. It ensures that the analysis can be undertaken in a targeted manner.
However, both the research questions and the analyses may be revised during the data analysis. It may also be that certain options are not yet clear before the start of the data analysis. Even explorative data analysis is possible. The findings and decisions made during the analyses may be documented at a later stage in the analysis plan, meaning the analysis plan becomes a dynamic document. However, there is also the option of documenting findings and decisions made during the data analysis in SPSS syntax (see guideline 1.4-05 Documentation of data analysis). In this instance the analysis plan only serves as the starting point.

The concrete research question needs to be formulated firstly within the analysis plan; this is the question intended to be answered by the analyses. Concrete research questions may be defined using the acronym PICO: Population, Intervention, Comparison, Outcomes. A question such as: “What are the risk factors for back pain?” is too general. An example of a concrete question could be: “Does frequent bending at work lead to an elevated risk of lower back pain occurring in employees?” (Population = Employees; Intervention = Frequent bending; Comparison = Infrequent bending; Outcome = Occurrence of back pain). Concrete research questions are essential for determining the analyses required.

The analysis plan should then describe which statistical techniques are to be used to analyse the data. The following issues need to be considered in this process and described where applicable:

  • Which (subgroup of the) population is to be included in the analyses
  • Data from which endpoint (T1, T2, etc) will be used?
  • Which (dependent and independent) variables are to be used in the analyses and how are the variables supposed to be analysed (e.g. continuous or in categories)
  • Which variables are to be investigated as potential confounders of effect modifiers and how are these variables supposed to be analysed. There are different ways of dealing with confounders. Often variables are only included as confounders if they influence the relationship between the determinant and outcome in actual fact (i.e. when they modify the regression coefficient of the determinant, see example). Another frequently used method is to include all variables that have a significant relationship with the outcome, even if they are perhaps not (strong) confounders.
  • How to deal with missing values
  • Which analyses are to be carried out in which order (e.g. univariate analyses, multivariate analyses, analysis of confounders, analysis of interaction effects, analysis of sub-populations, etc.).

A statistician may need to be consulted regarding the choice of statistical techniques. See details for an example of an analysis plan.

It can be quite efficient to create a number of empty tables to be included in the article prior to the start of data analysis. This is often very helpful in deciding which analyses are exactly required in order to analyse the data in a targeted manner.


Work-related psychosocial risk factors in relation to the occurrence of neck complaints.

Research question
What is the influence of the following psychosocial factors in the occurrence of neck complaints within 1 year in symptom-free employees?
1. Quantitative job demands
2. Skill discretion
3. Decision authority
4. Supervisor support
5. Co-worker support

All 977 individuals who were symptom-free at baseline measurement and had a full follow-up.

Outcome measure (dependent variable)
Dichotomous variable: Presence (1) or absence (0) of neck complaints
Time variable: Time prior to neck complaint arising (minimum length of time of 1 day) in days

Independent variables:
All independent variables and confounders are dimensions of the Job Content Questionnaire (Karasek questionnaire).
1. Quantitative job demands
2. Skill discretion
3. Decision authority
4. Supervisor support
5. Co-worker support

1. Qualitative job demands
2. Job security
For each analysis with 1 central psychosocial factor, the other 4 will be analysed as potential confounders.

Other potential confounders

  • Age
  • Sex
  • Coping styles (3 variables): Avoidance behaviour, seeking social support, approaching problems actively
  • Life events
  • Physical factors in leisure time (9 variables): Intensive sport/heavy physical activity during the last 4 months requiring a lot of exertion; Long-term sitting, computer screen work, working with hands above shoulder height, exertion with hands/arms; having to work in the same position for long periods of time, having to make the same hand/arm movements numerous times per minute, driving a vehicle, bending/twisting the upper body numerous times per hour.
  • Work-related physical factors (11 variables): Percentage of work time neck flexion >45 degrees; Percentage of work time seated; Percentage of work time neck rotation >45 degrees; Frequency of lifting >25 kg per working day; Percentage of work time making repetitive movements with arms/hands and frequency >4 times per minute; Percentage of work time upper arm elevation >60 degrees; Working with hands above shoulder height, Computer screen work; Working with vibrating or pulsating objects; Driving a vehicle at work; Bending/twisting of the upper body numerous times per hour.

Statistical analysis
One regression model for each psychosocial factor:
- Firstly, univariate Cox regressions; dependent variable neck complaints, independent variable is the central psychosocial factor

- Univariate Cox regressions of all potential confounders. Potential confounders with a p > 0.25 will no longer be considered as confounders.
- Multivariate Cox regressions of always 1 central psychosocial factor and 1 potential confounder using p < 0.25. When the change in the regression coefficient of the central psychosocial factor is around 10% or greater, then the potential confounder should be viewed as a true confounder, and this confounder should then be included in the multivariable analysis.
- Always add 1 potential confounder: If the change in the regression coefficient is greater than 10%, the confounder should be kept in the model, otherwise it can be excluded.

Effect modification
- Sex: Create a sex* psychosocial factor interaction. Add the interaction to the final model (with confounders). If the interaction is significant, then there is effect modification present.

Analysis plan: a stepwise plan created prior to the actual data analysis

V1.2:  1 Jan 2010: English translation.
V1.1:  21 Jan 2008: Text in guideline has been re-written with more emphasis on a flexible approach.

    • Has an analysis plan been created prior to the start of analysis?
    • Has a concrete research question been formulated in the analysis plan?
    • Have the points described under description been considered and have the most important options been decided?
    • Has a stepwise description of the analyses to be applied been provided in the analysis plan?