Stata Course

Welcome to a course in the statistical package Stata. The course is aimed at PhD. candidates, Post Doctors and Researchers in Medical Statistics and Epidemiology in general.

Stata is a statistical software for data analyses and an alternative to packages like SPSS, R or SAS. The buzzwords: "Obtain and manage data. Explore. Visualize. Model. Make inferences.”

The course is open to everyone and participants can attend those parts of the course that are of most interest. The upside of this is that there is no fee, no attendance sheet and no exam! The downside is that the course will not give any credits in the Ph.D. program.

Time and place

Every Tuesday at 12:30 PM–3:30 PM on Zoom from 25. January to 22. March (except 8.March). The courses will be held on Zoom. No registration needed. Zoom link for all sessions

You can find course material (presentations, syntax and data) at the end of this page.

Teachers: Hein Stigum, Jonathan Wörn. 

The course will have lectures in 3 levels:

  • Beginner: No previous experience in Stata.
  • Elementary: General knowledge of using Stata (as given by the two beginners’ courses)
  • Advanced: Experience in Stata use (as given by the elementary courses)    

In addition, some experience in data handling and statistical analysis will make understanding easier. We are targeting Ph.D. candidates, Post Doctors and Researchers in Medical Statistics and Epidemiology in general.

Description of topics

Introduction to Stata

Stata can be used from the menus or from syntax. Menus are good for beginners but somewhat slow to use. The Stata syntax is systematic and short. For an experienced user it is faster to write syntax. A syntax file is a precise description of an analysis and is crucial when you need to repeat the analysis. In this introductory class, you will be acquainted with the Stata software and learn how to use syntax.


Stata has great graphics (plots). You can visualize a large range of data and results. Plots look good “out of the box”, but every aspect can be altered and fine tuned to publication ready standards. This class will teach you about different plot types and how to adjust their look to your preferences.

Linear Regression

We use regressions for predictions or for estimating effects adjusted for confounders (or selection variables). Linear regression is used for continuous outcome data (weight, blood pressure, …). It is the easiest regression method to understand, and the techniques learned here can be used for other types of regressions. We handle non-linear dose response, interactions, non-constant error variance, the influence of outliers and predictions.

Logistic regression

Logistic regression is used for binary outcome data (disease yes/no, …). We handle non-linear dose response, interactions, the influence of outliers and predictions.

Survival analysis

Survival analysis is used for time to event outcomes (time to disease, time to death, …). The standard method is the Cox-model. We focus on the alternative Flexible Parametric Survival Models. These models estimate the same as the Cox under standard conditions, but allow easier handling of non-proportional hazards (time dependent hazard ratios). The models also have a wider range of prediction types including hazard differences and restricted mean survival times. The flexible models are a part of a whole ecosystem of programs for competing risk, multi-state models and much more.

Automating analysis

When we prepare analyses for a publication, we often redo the same analysis many times over with only small variations in data or methods. Writing a syntax that automatically prepares finished tables of figures both saves time and reduces errors. We will look at methods to achieve such automated analyses.  


Simulated data is useful for learning new methods and for examining the effect of violating assumptions. We will look at simulating data for linear, logistic and survival models. By writing syntax into a program, we can also make use of Stata tools for simulation, bootstrapping and power calculations.

Individual Fixed Effects Regression

Causal interpretation of observational data is often challenged by confounding and reverse causality issues. By following individuals over time and comparing their outcomes before and after they experienced a change in the potential predictor, a more credible causal interpretation of results is often possible. Another advantage of this within-person comparison is that time-constant characteristics of the person are ruled out as confounders. Individual fixed effects models are an elegant way of implementing the analytical approach described above. The model can be applied to different sorts of clustered data, including longitudinal data of individuals. In other contexts, the model can be used to account for both observed and unobserved confounders at the family or school level, for example by comparing different children from the same family (sibling fixed effects) or from the same school. This session will provide an introduction to the individual fixed effects model and includes practical examples of how to implement the model using Stata.


The course will have 3 hours of lectures (12:30-15:30) for each theme. We will give syntaxes at the end of each lesson. Participants are encouraged examine these using the example data, or better, in their own data.


Level Theme /Session Link Teacher Venue
25.jan Beginner

Introduction to STATA: Interface, file types, data handeling, basic commands

Jonathan Wörn Zoom 
1.feb Beginner Graphics: Making plots for data and results Jonathan Wörn Zoom 
8.feb Elementary Linear Regression: Standard model, non-linear effects, interactions, effects of outliers, predictions Hein Stigum
15.feb Elementary Logistic regression: Standard model, non-linear effects, interaction, effects of outliers, predictions
Hein Stigum
22.feb Advanced Survival analysis: Flexible Parametric Survival Models Hein Stigum Zoom 
1.march Advanced Automating analysis: Returned results, macros, matrices, loops Hein Stigum Zoom 
15.march Advanced Programing: Simulating data for Linear, Logistic and Survival data: Writing programs for Simulation, Bootstrapping and Power Calculations 
Hein Stigum


22.march Advanced Individual fixed effects regression: Examining within-unit changes, controlling for unit-specific characteristics. Setting up data; model specification and interpretaton; graphing results. Jonathan Wörn Zoom 


Department of Community Medicine and Global Health.

Contact: Hein Stigum



Published Nov. 25, 2021 9:32 AM - Last modified Mar. 23, 2022 10:15 AM