The multiple faces of shrinkage
Speaker: Georg Heinze, Professor, Center for Medical Statistics, Informatics, and Intelligent Systems (Section for Clinical Biometrics), Medical University of Vienna, Austria.
Shrinkage is a result of overfitting, if regression models are estimated with small or sparse data sets. In such situations predictions for new subjects are often ‘too extreme’ and their real outcomes are closer to an overall mean, i.e. they appear to be ‘shrunken’. Interestingly, ‘shrinkage’ is also often used to denote estimators that aim at anticipating shrinkage effects and preventing its occurrence. This duality has often caused confusion.
Shrinkage estimators can serve various purposes. Some methods were developed to optimize the calibration of prediction models. Other methods should reduce bias away from zero, which in logistic regression problems with small samples can be severe, but is absent, e.g., in linear regression. Another purpose could be to improve the accuracy, i.e., to reduce mean squared error of predictions or of effect estimates. Irrespective of their purpose, some of these methods can have a Bayesian motivation, where prior belief about possible values of common estimands such as log odds ratios is expressed as prior distributions centered at zero.
Shrinkage estimators can be constructed by maximizing a likelihood function penalized by an additional function of the parameters, which pulls estimates towards zero. Ridge and lasso regression are well-known examples, and so is Firth’s penalized likelihood. Other shrinkage estimators are constructed differently, e.g., estimating and applying post-estimation shrinkage factors by resampling methods. It is less well known that also classical variable selection methods can be interpreted as shrinkage estimators.
The talk will mainly focus on the setting of logistic regression with rare events. After a general introduction, we will compare shrinkage estimators by their assumed ‘pessimism’, i.e., the amount of overfitting that they anticipate (Kammer et al, 2017). Subsequently, we will investigate the improvement in accuracy of parameter estimates and predicted probabilities implicated with various shrinkage methods (Puhr et al, 2017). Finally, we will briefly discuss Bayesian noncollapsibility, i.e., likelihood penalization resulting in undesired anti-shrinkage, which can affect all well-known shrinkage estimators (Greenland, 2010; Geroldinger et al, 2017).
- Greenland S (2010). Simpson’s paradox from adding constants in contingency tables as an example of Bayesian noncollapsibility. The American Statistician 64(4), 340-44.
- Geroldinger A, Greenland S, Heinze G (2017). Anti-shrinkage from likelihood penalization: who’s afraid of Bayesian noncollapsibility? Manuscript under preparation.
- Kammer M, Dunkler D, Heinze G (2017). Some methods were more pessimistic than others: shrinkage estimators in risk regression models. Manuscript under preparation.
- Puhr R, Heinze G, Nold M, Lusa L, Geroldinger A (2017). Firth’s logistic regression with rare events – accurate effect estimates and predictions? Statistics in Medicine, early view.