# Previous meetings and seminars

Seminars autumn 2010

International Workshop on Modern Statistics for Climate Research

Workshop: Recent developements for case-control studies

Seminars spring 2010

Influenza Pandemic Seminar

Workshop on Causal Modelling

Seminars autumn 2009

Workshop: Enhancing estimation of incidence of problem drug use

Seminars spring 2009

Cancer research seminar

BMMS meeting at Soria Moria conference center

Minicourse: Statistics in climate research

Seminars autumn 2008

Course in Graphical and Causal Modelling in Genetics and Epidemiology

Mathematics in Medicine/Biology Workshop

Seminars spring 2008

Biostatistics Department 25 years

Seminars autumn 2007

Two lectures on stochastic modeling of infectious diseases

Workshop on Influenza modelling and preparedness

Evaluation of disease clusters

Seminars spring 2007

Meetings in the Infectious Disease Modeling Group 2007

Seminars autumn 2006

Workshop: Statistics for genome-wide copy number analyses in cancer research

Seminars spring 2006

Microarray Seminar

Meetings in the Infectious Disease Modeling Group 2006

Modern Statistical Methods in Epidemiology

Seminars autumn 2005

Centre for Biostatistical Modelling in the Medical Sciences meeting

Introductory course to infectious disease modelling

Workshop on statistical analysis of complex event history data

Seminars spring 2005

Modelling of Infectious Diseases

Seminars autumn 2004

The 4th Lysebu-meeting of Norevent

Nyere statistiske metoder i epidemiologi

Course on Multivariate Survival Analysis

Seminars spring 2004

Overlevelse, kliniske tidsdata og mikromatriser

Seminars autumn 2003

Norevent Research Kitchen

The 3rd Lysebu-meeting of Norevent

Seminars spring 2003

Course in advanced survival and event history analysis

Seminars autumn 2002

Workshop: Methods in infectious disease epidemiology

The 2nd Lysebu-meeting of Norevent

Seminars spring 2002

Seminars autumn 2001

Workshop: New developments in event history analysis

### Seminars autumn 2010

**December 2:**

**Michael Nothnagel**, Institute of Medical Informatics and Statistics, University of Kiel, talked about

*Statistical Inference of Allelic Imbalance from Transcriptome Data.*

**Abstract**: Next-generation sequencing has facilitated an analysis of somatic and meiotic mutations at unprecedented level. In this context, the study of allelic imbalance in intermediate RNA phenotypes may prove a useful means to elucidate the likely effects of DNA variants of unknown significance. We developed a statistical framework for the assessment of allelic imbalance in next-generation transcriptome sequencing (RNA-seq) data that requires knowledge neither of an expression reference nor of the underlying nuclear genotype(s), and that allows for sequencing errors. Both extensive simulations under a wide range of practically relevant scenarios and application to publicly available whole-transcriptome data with auxiliary genotype information showed superior power of our approach in terms of both genotype inference and allelic imbalance assessment, compared to the more naïve approach of completely ignoring allele miscalls, particularly at low sequencing coverage. The ability to assess somatic mutations and allelic imbalance in one and the same RNA-seq data set will make our framework particularly well suited for the analysis of somatic genetic variation in cancer studies.

**November 11**:

**Geir Aamodt**, Norwegian Institute of Public Health and the Norwegian University of Life Sciences, talked about

*Geographic information systems in epidemiological research.*

**Abstract**: Epidemiology is focused on finding factors associated with the development of diseases. Many of these factors are spatial or site-dependent, such as the content of drinking water, air quality, and radiation.

Geographic information system (GIS) is a tool integrating cartography, statistical analysis, and database technology. The system can help us to visualize data and produce site-dependent exposure variables. In studies based on health registries, GIS can be used to predict exposure variables for each participant in the study based on their home addresses or any other relevant location, given a set of observations or measurements. The tool can also help health authorities to find clusters or clustering of diseases, a process where exposure variables are not included.

**October 21**:

**Harald Weedon-Fekjær**, Department of Etiological Research, Cancer Registry of Norway, talked about

*Understanding recent breast cancer trends.*

**Abstract:** After decades of consistent increase, the incidence of breast cancer has recently shown a decline in many developed countries including USA, Sweden and Norway. There are several potential reasons for this change, including decreased hormone therapy [HT] use, screening programs finishing their introduction phase, or changes in other breast cancer risk factors. Earlier studies comparing incidence curves have indicated substantial effects of either screening or hormone therapy use on breast cancer incidence, but the effects are difficult to separate as they in most developed countries occurred relatively close in time and for similar age groups. Hence, there are still substantial uncertainties and a need for good estimates on the effects of the different factors potentially governing recent breast cancer trends.

In addition to a high quality nationwide cancer registry, Norway has county specific records of hormone therapy sales and a public screening program which was gradually introduced to new counties over a seven years period. During this period hormone therapy sales increased sharply and later declined, making Norway highly suitable for studying the effects of hormone therapy use and screening on the overall breast cancer incidence. Using a special age-period-cohort model with screening and hormone treatment information we have decomposed the effect of screening, hormone therapy use and birth cohort.

After correcting for hormone therapy and screening we achieved good model fit, explaining all significant non-linear period effects, including the recent fall in breast cancer incidence. New final estimates of screening and hormone therapy’s impact on breast cancer incidence will be presented.

**September 30:**

**Matteo Bottai**, Division of Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, US, talked about

*An introduction to quantile regression and its recent extensions through real life examples.*

**Abstract:** Research interest often lies in continuous outcome variables that do not appear to follow normal distributions. These are extremely frequent when the outcome is a positive measure (e.g. immunoglobulin serum concentrations, body mass index, time spent daily on vigorous physical activity) or when it is bounded within a known interval (e.g. visual analog scales of pain between 0 and 10 cm, percentages between 0 and 100%). When the frequency distribution of the outcome variable is skewed, has outlying values, shows multiple modes, or in general does not appear to be normal, traditional methods such as linear regression, t-tests, ANOVA, mixed effects models, Wilcoxon’s rank-sum or Kruskal-Wallis’ test may prove inadequate.

Quantile regression is an increasingly popular statistical method that allows inference on quantiles (e.g. median and other percentiles) of an outcome of interest and may provide great research insight. With non-normally distributed outcomes it may also have considerable advantages in terms of efficiency (i.e. shorter confidence intervals, greater power, and smaller sample size required), interpretability, and simplicity.

**September 16:**

**Theis Lange**, Department of Biostatistics, University of Copenhagen, talked about

*How to do mediation analysis of survival data*.

**Abstract**: A cornerstone of epidemiology research is to understand the causal pathways from an exposure to an outcome. For example, how much of the observed increase in the risk of long-term sickness absence associated with socioeconomic position (SEP) is mediated through an unhealthy work environment? Or in other words: How large is the direct effect from SEP to long term sickness absence and how large is the indirect effect through work environment?

In the case of an outcome, which is either binary or normal (conditional on covariates), a number of techniques to assess the strength of the different pathways from exposure to outcome have been developed, see e.g. Rubin (2004), Petersen, Sinisi & van der Laan (2006), and Hafeman & Schwartz (2009). However, as illustrated by the example above the outcome of interest will often be a survival time and as is well known survival times are non-normal and typically right censored. Therefore existing techniques for mediation analysis are not applicable to survival data.

The traditional approach to assess the magnitude of the pathway ”SEP” -> ” work environment” -> ”long term sickness absence” is to employ survival analysis techniques, such as the Cox model, and estimate hazard ratios corresponding to the exposure variable from models both with and without the mediator. A drop in hazard ratios from the analysis, which does not include the mediator, to the analysis, which does include the mediator, is taken as evidence of mediation through the mediator. However, as pointed out in Cole & Hernán (2002) and Kaufman, Maclehose & Kaufman (2004) such an analysis of mediation has severe shortcomings. Most importantly is it not possible to give a causal interpretation of the observed drop in hazard ratios. In addition this procedure does not allow for interactions between exposure and mediator. Finally, it is not mathematically consistent to employ a Cox model both with and without a potential mediator.

In this paper we propose a simple measure of mediation in a survival setting. The measure is based on the counterfactual framework of Pearl (2001) and, in the language of Hafeman & Schwartz (2009), measures the natural direct and indirect effects. It has a direct causal interpretation as the increase in hazard mediated through a given mediator. The method uses the Aalen additive hazard model for the observed survival times in combination with a standard model, e.g. OLS, for the mediator to obtain a simple yet flexible measure of mediation. The measure allows for different types of mediators, interactions, competing risks, and in addition the mathematical structure is internally consistent. The approach complements the work in Fosen, Ferkingstad, Borgan & Aalen (2006) and the talk will elaborate on the similarities and differences of the two approaches.

The techniques are illustrated by an application to the problem of long-term sickness absence discussed above. R code to compute the proposed causal measures of mediation along with their confidence bands is available from the authors upon request.

References

• Cole, S.R. & M.A. Hernán (2002), ‘Fallibility in estimating direct effects’, International Journal of Epidemiology 31, 163–165.

• Fosen, J., E. Ferkingstad, Ø. Borgan & O. Aalen (2006), ‘Dynamic path analysis - a new approach to analyzing time dependent covariates’, Lifetime Data Analysis 12, 143–167.

• Hafeman, D.M. & S. Schwartz (2009), ‘Opening the black box: a motivation for the assessment of mediation’, International Journal of Epidemiology 38, 838–845.

• Kaufman, J.S., R.F. Maclehose & S. Kaufman (2004), ‘A further critique of the analytic strategy of adjusting for covariates to identify biologic mediation’, Epidemiologic Perspectives & Innovations 1.

• Pearl, J. (2001), Direct and indirect effects, in ‘Proceedings of ASA Joint Statistical Meetings’.

• Rubin, D.B. (2004), ‘Direct and indirect causal effects via potential outcomes’, Scandinavian Journal of Statistics 31, 161–170.

### International Workshop on Modern Statistics for Climate Research

Two-day workshop at at Det Norske Videnskaps-Akademi, Drammensveien 78, Oslo, February 1-2, 2010.

Program.

Seminars spring 2010

**April 29:**

**Håkon Gjessing**, Norwegian Institute of Public Health, talk about

*Statstical efficiency, design and correcton for multiple testing in genetic association studies.*

**Abstract**: Statistical power is central to all planning of studies searching for disease genes, particularly since multiple testing is often a serious problem. The available resources for data collection and genotyping are limited, and the problem is to choose a combination of family data and independent controls that gives maximal power for a given total number of individuals. We show that relative efficiency is a much simpler concept to deal with than power in such connections, how the change-of-variance function can be used to select among designs, and how an extended score test can be used to minimize the efficiency loss in multiple testing.

**April 15:**

**Ivar Sønbø Kristiansen**, Institute of Health Management and Health Economics, University of Oslo, talked about

*How best to communicate the effectiveness of interventions for chronic diseases?*

**Abstract**: Considerable proportions of the health care budget are devoted to interventions for chronic diseases such as cancer, diabetes, cardiovascular disease, osteoporosis, etc. Approximately 75% of all mortality in Norway is caused by chronic diseases. Medication, surgery, life style changes etc. are used to reduce the risk of adverse events from chronic diseases. The effectiveness of such interventions is inferred from survival analysis using data from randomized trials or cohort studies. A crucial issue is then to explain the effectiveness to patients and health personnel in a way that they can understand. This lecture will first present various ways to communicate the effectiveness and studies of how well people are able to understand the message. Subsequently, it will be shown that the distribution of life year benefits across those who take the intervention is crucial for people’s valuation of the benefits. The presentation will end with a brief discussion of how statisticians can explore the distribution of life year changes in survival analysis.

**March 25:**

**Justin Manjourides**, Department of Biostatistics, Harvard School of Public Health, USA, talked about

*Improving disease surveillance by incorporating residential history.*

**Abstract**: Surveillance data related to chronic diseases typically involves a time component, often a birth date or date of diagnosis, and a spatial component, usually residence at time of diagnosis. However, if we are interested in studying the spatial patterns of chronic diseases, such as leukemia or bladder cancer, the current locations of the cases would be almost irrelevant to our study. The important information regarding the relationship between location and disease would be the residential history. If the disease of interest has a trigger point, then where the person resided at that time point is critical. Typically we do not know the exact time the disease was acquired. To overcome this concern, we incorporate a subject's residential history, which we weight by the incubation distribution specific to the disease of interest.

The goal of this research is to enhance methods used in disease surveillance by incorporating knowledge of the incubation distribution of the disease of interest. We study the spatial and temporal distribution of diagnosis or detection of disease and compare that distribution to the distribution of controls, or the background population. If the cases are distributed differently from the controls then an explanation of this discrepancy is in order. We seek to discover relationships between cases using a new measures of distance which incorporates information specific to the disease of interest. We use the M-statistic to evaluate the significance of these new distance distributions and the work of Sartwell, Armenian and Lillienfeld to define the incubation distributions.

**March 18:**

**Randi Selmer**, Norwegian Institute of Public Health, talked about

*Risikofunksjoner for hjerte- og karsykdommer - Norsk diagram for estimering av kardiovaskulær risiko.*

**Abstract**: Ulike risikofaktorer som kolesterol, blodtrykk og røyking, virker sammen på risikoen for å utvikle hjerte- og karsykdom. Det betyr at selv om man ikke har spesielt høye verdier for en enkelt risikofaktor, kan den samlede risiko likevel være betydelig. I nyere retningslinjer for forebygging av hjerte- og karsykdommer står samlet kardiovaskulær risiko sentralt. Risikodiagrammet som inngår i de norske retningslinjene, er basert på NORRISK-modellen. Diagrammet angir 10-års absolutt risiko for å dø av hjerte- og karsykdom, basert på kjønn, alder, blodtrykk, totalkolesterol og røykevaner. NORRISK-modellen er basert på norske epidemiologiske data og er utviklet i et samarbeid mellom Folkehelseinstituttet og Avdeling for biostatistikk ved Universitetet i Oslo. I foredraget vil jeg presentere NORRISK-modellen og bakgrunnen for denne, samt diskutere dilemmaer ved ulike valg av risikofunksjoner.

1. Nasjonale Retningslinjer for individuell primærforebygging av hjerte- og karsykdommer.

2. Selmer R, Lindman AS, Tverdal A, Pedersen JI, Njølstad I, Veierød MB. Modell for estimering av kardiovaskulær risiko i Norge. Tidsskr Nor Lægeforen 2008; 128: 286-90.

**March 4:**

**Morten Wang Fagerland**, Ullevål Department of Research Administration, Oslo University Hospital, talked about

*Outcome based subgroup analysis: a neglected concern. *

**Abstract**: The appeal, prevalence, and pitfalls of subgroup analysis in clinical trial reports have been debated extensively. However, this debate has been restricted to subgroups identified by baseline patient characteristics. Trials from diverse medical fields often analyze a subgroup of cases at some level of one outcome with respect to a second outcome. Such outcome based subgroup analysis (OBSGA) not only has an intrinsic potential for bias, but is also performed post-hoc, overdone, selectively reported, and over interpreted. By using real and hypothetical data, we illustrate some of the ways in which OBSGA occurs, describe the hazards associated with it, and suggest two simple methods to avoid a biased analysis of such data.

**January 21:**

**Siem Heisterkamp**, Groningen Bioinformatics Centre, University of Groningen, Netherlands, talked about

*Directed Acyclic Graphs and the use of Linear Mixed models. *

**Abstract**: Here.

### Influenza Pandemic Seminar

BMMS hosted a seminar on the influenza pandemic, Tuesday November 17 2009, 14:00-15:30, at Nye Auditorium 13, Domus Medica.

14:00-14:10 Introduction, Birgitte Freiesleben de Blasio, Dept. of Biostatistics UiO

14:10-14:30 The Spanish Flu, Svenn-Erik Mamelund, Division of Infectious Disease Control, Norwegian Institute of Public Health

14:30-14:45 Seasonal influenza and excess mortality, Jon Michael Gran, Dept. of Biostatistics, UiO

Coffee /Tea

14:55-15:15 Predicting spread patterns of influenza pathogens such as H1N1, Ottar Bjørnstad, Dept. of Biostatistics, UiO/Dept. of Entomology, Penn State University

15:15-15:30 Modelling the cost of influenza: from seasonal to pandemic, Yiting Xue, Dept. of Biostatistics, UiO

### Workshop on Causal Modelling

September 21-23, 2009.

International workshop organized by The Deptartment of Biostatistics, University of Oslo: Thematic research area BMMS, and Statistics for Innovation.

Venue: The Norwegian Academy of Science and Letters, Oslo, Norway – Det Norske Videnskaps-Akademi. Drammensveien 78, Oslo.

Sponsors: The Research Council of Norway and the University of Oslo.

See detailed information and program.

An introductory course to causal inference was also held on September 14-15. The course was intended for those who wanted to refresh their knowledge prior to the workshop.

Venue: Domus Medica, Gaustad, Sognsvannsveien 9, Oslo, Norway.

See the program for details.

### Seminars autumn 2009

**December 18:**

**Julia Mortera**, Università Roma Tre, Rome, Italy, talked about

*Bayesian Networks for Complex DNA mixture analysis. *

**Abstract**: We show how probabilistic expert systems can be used to analyse forensic identification problems involving DNA mixture traces using peak area information. This information can be exploited to make inferences regarding the genetic profiles of unknown contributors to the mixture, or for evaluating the evidential strength for a hypothesis that DNA from a particular person is present in the mixture. We will also present an extension of the Bayesian network for taking account artifacts such as allelic dropout, stutter bands and silent alleles when interpreting DNA profiles.

We illustrate the use of the network on a published criminal casework example.

This is joint work with Robert Cowell and Steffen Lauritzen.

**November 19:**

**Odd O. Aalen**, Department of Biostatistics, University of Oslo, talked about

*Why does statistics play such a large role in medical research?*

**Abstract**: Statistics are everywhere in medical journals. Open a journal at any page and you are quite likely to come across statistical analyses and concepts, often quite sophisticated ones. Why is this so? What can statistics offer that apparently is so useful? It is a bit surprising, after all, considering that statistics is usually moving on the surface of phenomena, e.g. counting occurrences in various risk groups in typical epidemiological research.

One reason is the limited ability of basic medical research to produce firm conclusions about the effect of treatments or preventive measures. Understanding of biological mechanisms is generally far too limited to draw anything like safe conclusions. The simple minded empiricism of statistics gives a necessary reality check: does the treatment really work?, is the suggested measure to prevent disease really a good idea?

So apparently, statistics is necessary. And it is getting more sophisticated, and even changing. New areas are infectious disease and systems biology which requires new mathematics, like network models.

So biostatistics is undoubtedly a success, but this just raises new challenges: The modern biostatisticians need to know more mathematics, and at the same time more biology than before. How do we handle this?

On the "long and winding road" to the eradication of disease statistics and basic biomedical research need to go hand in hand. How can statistics continue to be up to this challenge?

**October 29:**

**Christine Parr**, Department of Biostatistics, University of Oslo, talked about

*Recall bias in case-control studies.*

**Abstract**: The main concern is that people with disease may remember or report past exposures differently compared to those without disease, introducing a form of differential measurement error in estimates of disease risk. However, recall bias has been investigated in few studies and only for some risk factors, and often with a sub-optimal design.

The evidence implicating sun exposure and pigmentation factors in the etiology of skin cancer derives largely from case-control studies, and the potential for recall bias may be large after much public information and media coverage about this relation ship.

In this presentation I will mainly focus on the work behind a published paper on “Recall Bias in Melanoma Risk Factors and Measurement Error Effects” (Am. J. Epidemiol. 2009; 169: 257-266), but also briefly talk about recall bias in other exposures, including diet and mobile phone use.

**October 15:**

**Joe Sexton**, Department of Biostatistics, University of Oslo, talked about

*P-value combination for testing group difference.*

**Abstract**: Suppose multiple (dependent) measurements have been made on two groups. How should these be combined into a global test of group difference?

This is a common but not necessarily easy problem. If variables are normal, and not too numerous, then methods exist. However, what if the variables are of mixed type: some categorical, some continuous and others ordinal? Potentially tricky. Luckily, there is an elegant solution due Pesarin (2001). The method is based on combining p-values derived from permutation tests, and is both powerful and versatile. It is described.

Pesarin's method relies on a combination function for pooling the p-values of different tests into a single statistic. There are many such combination functions, the most popular being due to Fisher. The relative merits of a combination function is situation dependent. In particular, when there are a large number of tests and the signal (i.e. group difference) is confined to a small fraction, classical comination methods can perform poorly. Some adaptive combination functions are described for dealing with this type of situation.

September 17:

**Marit Veierød**, Department of Biostatistics, University of Oslo, talked about

*Solarium - venn eller fiende?*

**Abstract**: Solariebruk gir økt risiko for hudkreft, men kan det også forebygge sykdom?

### Workshop: Enhancing estimation of incidence of problem drug use

May 25-26, 2009.

Workshop hosted by The Norwegian Institute for Alcohol and Drug Research (SIRUS), and jointly organazied by BMMS, (sfi)^{2}, SIRUS and The European Monitoring Centre for Drugs and Drug Addiction (EMCDDA).

### Seminars spring 2009

May 28:

**Gianpaolo Scalia Tomba**, Deptartment of Mathematics, University of Rome "Tor Vergata", Italy, talked about

*Mathematical models for infectious disease epidemics: another kind of "epidemiology".*

**Abstract**: During the past decades, modelling of infectious diseases has seen an increasing use in prevention and intervention planning, for instance in connection with the recent alarm about potentially pandemic "human" avian flu. The seminar is intended to illustrate the theoretical bases of epidemic models and to present some recent results on the concept of "generation time" for infectious diseases and various relations with demography, probability theory and statistics.

May 14:

**Jo Røislien**, Department of Biostatistics, University of Oslo, talked about

*Statistical analysis of human gait.*

**Abstract**: Human gait, i.e. walking, is a continuous motion. Thus, sampling human gait cycles at given times, for example every 1/100 second, and perform point-by-point analysis on these points, will be strongly affected by the rather large correlation between points within a gait cycle. Functional Data Analysis (FDA) resolves this problem by instead fitting a mathematical function to a gait cycle, for example by Fourier analysis, and then perform statistical analysis on this functional object. In this way classical statistical methods can be applied with proper modification.

March 19:

**Georg Heinze**, Core Unit for Medical Statistics and Informatics, Section of Clinical Biometrics, Medical University of Vienna, talked about

*Avoiding infinite estimates in logistic and Cox regression - theory, solution, examples.*

**Abstract**: Odds-ratios from two-by-two tables are degenerate when there is a zero cell count. More generally, in logistic or Cox regression analyses of small or sparse data sets, often the likelihood converges, but at least one parameter (log odds ratio or log hazard ratio) may diverge to plus or minus infinity. Consequently, researchers are often puzzled by apparently high odds (hazard) ratio estimates which, however, lack statistical significance. We give conditions for this situation, termed 'separation' or 'monotone likelihood' in the context of logistic or Cox regression, respectively. Furthermore, we give examples of some medical studies, where separation occurred: (1) a two-by-two table, (2) logistic regression with two continuous covariates, (3) conditional logistic regression with clustered data, (4) Cox regression with a categorical variable showing no events in one category, and (5) Cox regression with a time-dependent effect. We show that a penalized likelihood approach that goes back to Firth (1993, Biometrika 80, 27-38) provides an ideal solution, and discuss its application to logistic regression, conditional logistic regression and Cox regression. Simulation results suggest that the penalized likelihood method yields nearly unbiased estimates even if standard ML estimates are prone to bias and the probability of infinite ML estimates is no longer negligible. Confidence intervals and tests based on the profile penalized likelihood appear to have close-to-nominal coverage rates and yield higher power compared to competitive approaches. In comparative analyses of our examples including possible alternative approaches, it turns out that the penalized likelihood approach provides an ideal solution. Finally, we give an overview of software that can be used to apply the proposed penalized likelihood approach.

March 5:

Bruno Ledergerber, Division of Infectious Diseases and Hospital Epidemiology, University of Zurich, Switzerland, talked about

*HIV Cohorts: Strengths and Limitations.*

Abstract: Over the last 20 years HIV cohorts have yielded a wealth of insight and knowledge across a multitude of disciplines. Using the Swiss HIV Cohort Study as an example, the seminar will cover the potential and strengths of cohort studies but also the limitations of the different settings of observational research in HIV. The willingness to merge data in numerous large collaborations when scientifically needed for e.g. analyses of rare outcomes is unequalled by other medical disciplines. We will discuss guiding principles and different setups.

Jan 19:

Henrik Støvring, Institute of Public Health, University of Southern Denmark, talked about

Weibull Survival Analysis with Truncated, Interval Censored Event Data, or with Empirical Survivor Function Data only.

Abstract: A clinical trial, published in New England Journal of Medicine in 2004, found Bevacizumab to be an effective addition in treatment of metastatic colorectal cancer. A British health technology assessment conducted by NICE, published in 2007, did however not find the treatment cost-effective. Subsequently, Norwegian health authorities wanted to evaluate cost-effectiveness of Bevacizumab relative to current Norwegian treatment practice, and as a first step a comparison of mean survival times within each treatment regime was thus requested. From the clinical trial, a Kaplan-Meier estimate of the survivor function was available, while the Norwegian Cancer Registry provided monthly counts of deaths and censorings following diagnosis of metastatic colorectal cancer was obtained. For comparability, all deaths and censorings within three months of diagnosis were excluded administratively from the Norwegian data.

Following the earlier approach adopted by NICE, mean survival in both treatment regimes was estimated from a Weibull model. While this in principle is straightforward, we briefly discuss the obstacles presented by the data at hand, but in particular draw attention to the importance ounbiased estimation of uncertainty. We further discuss the necessary assumptions for a Weibull-based analysis and its implications in this context. We present key results of our (re)analysis, and discuss their consequences both with respect to treatment of patients with colorectal cancer, in particular, and future health technology assessments, in general.

### Cancer research seminar

The Cancer Registry of Norway and BMMS arranged a one day seminar Thursday October 30th 2008, titled Cancer research: from genomics to epidemiology.

See the program for more details.

### BMMS meeting at Soria Moria conference center

September 26.

Program:

0830-0900: Registration, coffee and fruit

0900-0925: Hazard rates from Gamma process crossings. Nils Lid Hjort, Institute of mathematics, Department of statistics, University of Oslo.

0925-0950: Random graphs. Taral Seierstad, IMB/Department of biostatistics, University of Oslo

0955-1020: Estimating the causal effect of treatment on survival from HIV - a sequential Cox approach. Jon Michael Gran, IMB/Department of biostatistics, University of Oslo

1020-1040: Coffee break

1040-1105: Classification of copy number alterations in human breast tumors. Ole Christian Lingjærde, Department of informatics, University of Oslo.

1105-1130: Survival prediction from micro array data. Ståle Nygård, Institute of mathematics, Department of statistics, University of Oslo.

1130-1230: Lunch

1230-1255: Hierarchical frailty model for family data, with applications to melanoma. Tron Anders Moger, IMB/Department of biostatistics, University of Oslo.

1255-1320: Ultraviolet radiation and risk of malignant melanoma. Marit Veierød, IMB/Department of biostatistics, University of Oslo and the Cancer Registry of Norway.

1320-1335: Break

1335-1400: Clinical cancer registries - Challenges in analyzing treatment effects from observational data. Jan F. Nygård, the Cancer Registry of Norway.

1400-1415: Closing remarks.

1415: End

### Minicourse: Statistics in climate research

Tuesday September 16 an Thursday 18, Peter Guttorp from University of Washington, Seattle, was giving a small course in statistical issues in climate research. Guttorp has been involved in the work of the Intergovernmental Panel on Climate Change (IPCC) and has considerable experience in this field, see his web page: http://www.stat.washington.edu/peter/

The lectures was free for all, and took place at: Room alfa-omega at the Norwegian Computing Center (NR).

See the program for more information.

### Seminars autumn 2008

Nov 20:

Edsel A. Pena, Department of Statistics, University of South Carolina, talked about

Issues of Optimality in Multiple Statistical Hypotheses Testing.

Abstract: High-dimensional data, characterized by a large number (M) of variables or characteristics (`genes') but with usually a smaller number (n) of replications or samples for each variable, arise in many areas such as the biological, medical, engineering and economic areas. The increase in the number of such "large M, small n" data sets can be attributed to advances in high-throughput technology, notably microarray technology. This has led to the development of statistical methods appropriate for such data sets. A specific statistical problem that arise is that of multiple testing or multiple decision-making where for each variable there are two competing hypotheses or two competing actions, so the problem is to test simultaneously M pairs of hypotheses (or, equivalently, to choose among 2^M possible actions) based on the high-dimensional data. In such multiple testing problems, there is a need to recognize the impact of multiplicity, and so the relevant "Type I Error" is usually defined in several ways, such as the family-wise error rate (FWER) or the false discovery rate (FDR). The goal is to use test/decision functions such that the chosen Type I Error rate is controlled, while also minimizing some measure of "Type II Error" or equivalently maximizing some measure of "power." Many existing procedures currently in use for controlling the FWER or the FDR rely on the set of p-values of the M individual tests, such as the Sidak procedure for FWER control or the popular Benjamini-Hochberg (BH) procedure for control of FDR. These current procedures, however, do not exploit the possibly differing powers of the individual tests, which could be due to differing effect sizes or the use of different types of tests. In this talk I will present some recent results in which the power of the multiple-testing procedure (for FWER- and FDR-control) is enhanced through exploitation of the individual powers of the M tests.

Nov 13:

Eric Nævdal, Department of Economics, University of Oslo, talked about

Fighting Transient Epidemics - Optimal Vaccination Schedules Before and After an Outbreak.

Abstract: Epidemic diseases afflict all countries and all epidemics are costly to society. The present paper examines optimal vaccination trajectories before and after the outbreak of a special class of epidemics where the disease normally eradicates itself. This class of epidemics include diseases where mortality is extreme such as various forms of the plague and new viral epidemics such as the Ebola virus as well as less fatal, but economically significant epidemics as the flu. One important insight is that there may be increasing returns to scale in vaccination.

Oct 16:

Martin Camitz, Swedish Institute for Infectious Disease Control, Department of Medical Biostatistics and Epidemiology, Karolinska Institutet, and Department of Sociology, Stockholm University, talked about

StatFlu - A static model for pandemic flu hospital load prediction.

Abstract: During the past few decades or so computer simulations of disease spread have been developed to aid our understanding of epidemics and pandemics, providing decision makers with vital information. Static models are less sophisticated but have the advantage of being much more transparent and easy too implement as decison making tools. StatFlu is the latest development in this area, now in use with the Board of National Health and Welfare in Sweden.

Oct 9:

Geir Egil Eide, Associate Professor, Section for Epidemiology and Medical Statistics, University of Bergen, talked about

Statistisk kvantifisering av effekt av forebyggende sykdomstiltak.

Abstract: Here.

Sept 25:

Birgir Hrafnkelsson, Associate Research Professor, Applied Mathematics Division, University of Iceland, talked about

Linear mixture models for fillet yield of cod in Icelandic waters.

Abstract: Data were collect on variables related to catching and processing of cod in Icelandic waters from 2002 to 2006. These data are utilized to evaluate the effect of conditions at catch and after catching on fillet yield. The spatial and spatio-temporal effects on yield are also evaluated.

A linear mixture model is developed for fillet yield. The spatial and spatio-temporal effects are modeled as Gaussian Markov random fields. The error terms are modeled in four different ways; i) as independent Gaussian variables with same variance, ii) as independent Gaussian variables with variance depending on location, iii) as independent t-variables with same degrees of freedom and same scale parameter, iv) as independent t-variables with same degrees of freedom and scale depending on location. These four models are compared through the deviance information criterion.

Sept 11:

Axel Gandy, Imperial College London, talked about

Monitoring Event Histories.

Abstract: Monitoring performance of hospitals is becoming more and more common in the medical field. Most methods are based on logistic models or normal distribution assumptions. In this talk we investigate monitoring schemes using event history analysis. In particular, we suggest a monitoring scheme based on cumulative sum (CUSUM) charts designed to detect a proportional change in the hazard. As an example, we consider monitoring the length of stay in acute hospitals for stroke, adjusting for case mix factors and transfer rates to different hospitals. The charts have wide applicability in other fields besides medicine, e.g. in reliability and in finance.

Sept 4:

Michael Nothnagel, Dr, Christian Albrechts University, University Hospital Schleswig-Holstein, Institute of Medical Informatics and Statistics, Kiel, Germany, talked about

Population structure inference and genetic matching in Europe.

Abstract: Population stratification is known to be a potential confounder in genetic association studies. Self-reported ancestry of nationality is known to be prone to error, but might also be to unspecific a criterion. Genetic matching, e.g. between cases and controls, by using a large number of genetic markers can possibly prevent systematic differences in the ancestry of phenotypic groups in the analysis. Here, we investigate the genetic structure in a large European sample set using genome-wide data of single nucleotide polymorphisms (SNP). Furthermore, we investigate if a small number of ancestry-sensitive markers (ASM) are sufficient to allow a genetic matching in European samples with a similar accuracy as the complete, genome-wide marker set. Our results indicate that, besides a small number of highly informative markers, the great majority of markers contain only little information for matching and that a large ASM number is required for reliable matching within Europe.

Aug 25:

Rob Tibshirani, Professor of Health Research and Policy, and Statistics, Stanford University, talked about

Some recent advances and challenges in the analysis of high-dimensional data.

### Course in Graphical and Causal Modelling in Genetics and Epidemiology

BMMS, together with SFI2, held a three days course on June 11-13, 2008.

Teachers were Vanessa Didelez, Dept. of Mathematics, University of Bristol, and Nuala Sheehan, Dept. of Health Sciences, University of Leicester.

For more information, see the program.

### Mathematics in Medicine/Biology Workshop

The Centre of Mathematics for Applications (CMA) in collaboration with the Centre for Biostatistical Modelling in the Medical Sciences (BMMS) at University of Oslo, organized a workshop on Mathematics in Medicine/Biology at March 4-5, 2008.

Program.

### Seminars spring 2008

May 7:

Anthony Davison, Ecole Polytechnique Fédérale de Lausanne, talked about

Smoothing and temperature extremes.

Abstract: Smoothing methods are now well-established in many domains of statistics, and are increasingly used in analysis of extremal data. The talk will describe some applications of smoothing to data on temperature extremes, elucidating the relation between cold winter weather in the Alps and the North Atlantic Oscillation, and changes in the lengths of usually hot and cold spells in Britain. The work mixes classical models for extremes, generalised additive modelling, local polynomial smoothing, and the bootstrap, and is joint with Valérie Chavez-Demoulin and Mariá Süveges.

April 24:

Marion Haugen, Department of Biostatistics, University of Oslo, talked about

Frailty modelling of bimodal age-incidence curves of nasopharyngeal carcinoma.

Abstract: The incidence of nasopharyngeal carcinoma (NPC) varies widely according to age at diagnosis, geographic location and ethnic background. Around the world, age-adjusted incidence rates range from 20-30 per 100 000 person-years in parts of Hong Kong and south-eastern Asia, to less than 1 per 100 000 across most of the United States and Europe. The profile of NPC is different in areas of high compared with low incidence. Low-risk populations have a bimodal age-incidence curve with a small peak occurring between 15 and 24 years of age, and a second peak at ages 65-74 years. This bimodality of the age-incidence curve can be seen as a frailty phenomenon. Most individuals are non-susceptible to the disease, but a certain subset of individuals has an increased risk of an NPC diagnosis at a given age. In other words, they are more frail or susceptible to disease onset than others.

A modification of the multiplicative frailty model makes it possible to take the bimodality into account. The model has two independent compound Poisson distributed frailties and the covariates sex, area and diagnosis period are included into the underlying Poisson parameter. The model gives a good fit to NPC incidence rates for both males and females from five aggregated low-risk areas (North America, Japan, North and West Europe, Australia and India) diagnosed over the period 1983-97.

March 13:

Jarle Breivik, Institute for Basic Medical Science, University of Oslo, talked about

Do proteins shape the nucleotide sequence of their own alleles?

Abstract: The idea that proteins influence their own nucleotide sequence may seem to contradict current paradigms of molecular biology. Yet, many proteins have been shown affect the nucleotide composition of their genomes. As a primary example, DNA mismatch repair proteins introduce mutation biases that influence the length of microsatellites (simple sequence repeats). Accordingly, there is a potential for such proteins to shape nucleotide sequences within their own alleles. Moreover, given allelic recombination through the course of evolution, an allele should be more affected by the mutation bias of their own proteins than other alleles in a genome. In theory, the sequence composition of an allele should thus reflect the mutation bias of its own protein. In this talk I will present excising evidence, a theoretical model and bioinformatic analyses that support this hypothesis.

The talk involves collaborative work with Daniel S. Falster, Marie Bergem-Ohr, Andrés Ögmundsson and Einar Andreas Rødland.

March 6:

Tony Pettitt, School of Mathematical Sciences, Queensland University of Technology, Australia, talked about

Statistical inference for assessing infection control measures for the transmission of pathogens in hospitals.

Abstract: Here.

Feb 11:

Peter Müller, The Texas University M.D. Anderson Cancer Center, talked about

Bayesian Clustering with Regression.

Abstract: We propose a model for covariate-dependent clustering, i.e., we develop a probability model for random partitions that is indexed by covariates. The motivating application is inference for a clinical trial. As part of the desired inference we wish to define clusters of patients. Defining a prior probability model for cluster memberships should include a regression on patient baseline covariates. We build on product partition models (PPM). We define an extension of the PPM to include the desired regression. This is achieved by including in the cohesion function a new factor that increases the probability of experimental units with similar covariates to be included in the same cluster.

We discuss implementations suitable for continuous, categorical, count and ordinal covariates.

Kjell Stordahl, Seniorrådgiver, Telenor, Norway, Finance, talked about

IPCCs klimamodeller, statistikk og prognoser.

Discussant: Lars Holden, Managing Director, Norwegian Computing Center.

Presentations: Stordahl, Holden.

Abstract: Det går ikke en dag uten at en registrerer klimadebatten i ulike media. Debatten har i lengre tid nå gått fra å være en diskusjon om klimamodellene og FNs klimapanels (IPCC) resultater til bli en diskusjon om hvilke tiltak som må gjøres for å ivareta en rimelig klimautvikling. Dette foredraget går et steg tilbake og setter søkelyset på klimamodellene og IPCCs resultater.

Første del av foredraget gir en oversikt over noen fundamentale forutsetninger i klimaprosessen på jorden. I tillegg gis det noe informasjon om IPCCs klimamodeller og det sees spesielt på de viktigste variable som påvirker klimautviklingen. Utgangspunktet vil i hovedsak være en av de tre hovedrapporter til IPCC: "The Physical Science Basis", Fourth Assessment Report (2007).

Klimaprognosene blir gjennomgått og resultater fra IPCCs modellkjøringer (simuleringer basert på "ensamblekjøringer") vil bli vist. Det vil bli lagt vekt på å diskutere usikkerheten i temperaturprognosene. Her henvises det også til kronikk på http://www.forskning.no og videre debatt. Her er linkene:

http://www.forskning.no/Artikler/2007/desember/1198132621.88

http://www.forskning.no/Artikler/2007/desember/1198864956.47

http://www.forskning.no/Artikler/2008/januar/1199586235.5

http://www.forskning.no/Artikler/2008/januar/1200358028.29

http://www.forskning.no/Artikler/2008/januar/1200710326.75

Siste del av foredraget fokuserer på hvorledes IPCC presenterer sin statistikk som etter min vurdering på noen viktige felt, er lite objektiv og som bryter med prinsipper som enhver statistiker er opplært til å følge.

Om tiden tillater det, vil jeg også gi en oversikt over felter innen klimaforskning der det virkelig er utfordringer for statistikere.

### Biostatistics Department 25 years

The Department of Biostatistics at IMB, University of Oslo, is 25 years. A celebration took place monday November 12.

Seminars autumn 2007

Nov 22:

Tore Schweder, University of Oslo, Mette Langaas, Norwegian University of Science and Technology and Helmut Finner, Heinrich-Heine Universität Düsseldorf, talked,

Celebrating an important paper.

Abstract: 25 years ago Tore Schweder, who is presently professor at the University of Oslo, and Emil Spjøtvoll (who died in 2002), published a paper in Biometrika on plotting of P-values (see below). For many years the paper did not attract much attention. This completely changed with the introduction of high throughput genomics data (like microarray data) a few years ago. The fundamental idea of Schweder and Spjøtvoll of considering a large number of hypotheses and then using the P-values to estimate the number of true hypotheses suddenly became the tool of the day. This concept has become a very important input to algorithms for controlling the famous False Discovery Rate. Hence this paper is one of the most important ones in Norwegian statistics and we take the opportunity to celebrate its 25th anniversary by organizing a seminar.

Program.

Oct 25:

Nuala Sheean, Department of Health Sciences and Department of Genetics, University of Leicester, talked about

Inferring Causality in Observational Epidemiology.

Abstract: "Mendelian randomisation" is an approach that purports to circumvent the problem of confounding in observational epidemiology and is based on the method of instrumental variables (IVs) where the instrument, in this case, is a genetic predisposition. It exploits the idea that a well understood genotype, known to affect levels of a modifiable exposure or phenotype, only affects the disease status indirectly via its effect on the phenotype and is assigned randomly at meiosis (given the parental types).The genotype in Mendelian randomisation applications can often be reasonably assumed to satisfy the core conditions of an instrumental variable. We will show that while testing for a causal effect of phenotype on disease by testing for an association between genotype and disease is reasonable for most practical purposes, strong additional parametric assumptions are required in order to estimate the effect and these are usually not justifiable when a binary disease outcome is of interest. Even the specification of the causal parameter is not obvious in this case and determination of its relationship to the relevant regression parameters that can be estimated from the data is not straightforward.

The practical difficulties of inferring causality are compounded by the theoretical problem of expressing causal aims and methods in a mathematical language. The medical literature often employs causal vocabulary loosely to express something that is more than association between potential risk factors and their effects. Underlying knowledge about the biology of the problem may allow one to deduce the direction of an observed association and terms such as ``causal pathways'' for disease frequently occur in the epidemiological literature. Despite recently proposed advances towards a formal causal framework for epidemiological applications, such frameworks are not very widely adopted in general and in particular, are not reflected in the Mendelian randomisation literature at all. We argue that a formal, mathematically precise, causal framework firstly, allows us to state precisely what the quantity (parameter) of interest is and secondly, to formalise how associational findings and causal implications are rela ed in order to obtain an estimate for this particular parameter. Furthe more, we show that graphical representations are useful when incorporating additional background knowledge and for verifying any conditions that are necessary for causal inference.

Oct 18:

Daniel Farewell, Dept of Epidemiology, Statistics and Public Health, Cardiff University, talked about

Simple models for informatively censored longitudinal data.

Abstract: Models for longitudinal measurements truncated by possibly informative dropout have tended to be either mathematically complex or computationally demanding. I will review a recently proposed alternative, using simple ideas from event-history analysis (where censoring is commonplace) to yield moment-based estimators for balanced, continuous longitudinal data. I shall then discuss some work in progress: extending these ideas to more general longitudinal data, while maintaining simplicity of understanding and implementation.

Sept 27:

Taral Guldahl Seierstad, Department of Biostatistics, UiO, talked about

Phase transitions in random graphs.

Abstract: A common phenomenon in random graphs is that the structure of the graph may change substantially, caused by a relatively small change in the random procedure producing the graph. For example, suppose that G is a random graph with n vertices, such that every potential edge is included with probability c/n, independently of other edges. If c<1, then with high probability the graph consists of many small components with O(log n) vertices. On the other hand, if c>1, then with high probability the graph contains a unique "giant" component, which is much larger than every other component. This radical change, caused by a small increase in the edge probability, is called a "phase transition". In my talk I will present some random graph models which exhibit such phase transitions, and explain how one can study such phenomena mathematically.

Sept 13:

Karim F. Hirji, Dar es Salaam, Tanzania, talked about

Assessing the Quality of Clinical Trials: Is There a Short-Cut?

Abstract: Assessing the quality of clinical trials is essential both for judging the reliability of their conclusions and for incorporating them into systematic reviews. This seminar will present and discuss a paper which demonstrates, with the help of a case study, that the current methodology of trial quality evaluation has serious loopholes. Thereby, at times, even highly flawed trials end up being rated good quality trials. The case study is a clinical trial of antibiotic therapy for acute otitis media in children.

It will be shown that most systematic reviews on the topic, including the relevant Cochrane Review, and a recent individual patient data meta-analysis, have declared it a good quality trial, and have extensively used its data in the formulation of overall therapeutic evidence.

On the other hand, through an in-depth evaluation, we will bring to light a series of serious but unrecognized problems that affected the design, conduct, analysis and report of the trial. We will argue that their combined effect potentially renders it a fatally flawed study. These problems include unethical recruitment of subjects in violation of a key exclusion criterion, inadequate record keeping, extreme baseline noncomparability, poor and biased short term follow up, nonrandom patterns of missing data, inconsistently specified important outcome variables, egregiously erroneous data analysis, and biased interpretation of the findings.

The general lesson we draw is there are no short-cuts for evaluating the quality of a clinical trial. The usual check-list based assessment has to be augmented with an in-depth review, and where possible, a scrutiny of the protocol, trial records, and original data. Other ways of preventing the conduct and dissemination of the findings of deeply flawed trials will also be discussed.

Speaker: Dr. Karim F. Hirji has a doctorate in biostatistics from Harvard University. He has been involved with clinical trial design and analysis for two decades, and has taught courses on medical statistics and evidence based medicine in Tanzania, USA and Norway. He is the author of a recent book on analysis data from studies with small sample sizes.

Aug 16:

Jon Wakefield, Departments of Statistics and Biostatistics, University of Washington, Seattle, talked about

Measures of Association in Genome-Wide Association Studies.

Abstract: In the context of genome-wide association studies we critique a number of methods that have been suggested for flagging associations for further investigation. The p-value is by far the most commonly used measure, but requires careful calibration when the a priori probability of an association is small, and discards information by not considering the power associated with each test. The q-value is a frequentist method by which the false discovery rate (FDR) may be controlled. We advocate the use of the Bayes factor as a summary of the information in the data with respect to the comparison of the null and alternative hypotheses, and describe a recently-proposed approach to the calculation of the Bayes factor that is easily implemented. The combination of data across studies is straightforward using the Bayes factor approach, as are power calculations. The Bayes factor and the q-value provide complementary information and when used in addition to the p-value may be used to reduce the number of reported findings that are subsequently not reproduced.

### Two lectures on stochastic modeling of infectious diseases

Thursday May 31, jointly organised with sfi2.

Program:

13.00-14.00 Tom Britton (Stockholm University): "Random graphs, epidemics and vaccination" (with discussion)

14.00-14.30 tea and coffee

14.30-15.30 Pieter Trapman (University of Amsterdam): "Epidemics on networks " (with discussion)

### Workshop on Influenza modelling and preparedness

One day meeting at Thon Hotel Opera, Oslo, Monday May 21, 2007.

Program

### Evaluation of disease clusters

One day meeting at the Cancer Registry of Norway, Tuesday January 23, 2007.

Program

### Seminars spring 2007

June 7:

Henrik René Cederkvist, Rikshospitalet, snakket om

En sammenlikning av metoder for testing av modellers prediksjonsevne.

Sammendrag: Design av regresjonsmodeller beregnet for tolkning og prediksjon er et viktig tema innen anvendt statistikk og kjemometri. Mange metoder eksisterer, og det er vanskelig å velge den beste. Det er i tillegg mange faktorer å ta hensyn til.

Når et antall modeller er valgt, må en validere og sammenlikne for å identifisere og velge den "beste". Hvis kriteriet er prediksjonsevne bruker en vanligvis kryssvalidert RMSEP (Root Mean Square Error of Prediction) og deretter rangerer modellene etter lavest RMSEP verdi. Problemet er at en slik form for rangering ikke er tilstrekkelig siden en liten endring i datasettet ofte vil føre til at denne rangering endres. Dette er også viktig i PLS siden bruken av minst RMSEP kan gi større modeller enn nødvendig. Det er da nødvendig med metoder som kan forenkle valget mellom en enkel modell på den ene siden og en kompleks modell på den andre, uten at en får reduksjon i prediksjonsevne.

Vi skal i denne presentasjonen vise hvordan man kan bruke parametriske og ikke-parametrisk variansanalyse metoder til testing av modellers prediktive evne. Sammenlikningen av metodene vil være basert på simulert/estimerte styrkefunksjoner. Vi studerer forksjellige problemstillinger (betingelser for klassisk variansanalyse) som kan oppstå slik som korrelerte observasjoner, mange modeller, ulik varians m.m. Eksempler vil bli basert på PLS og PCR modeller.

May 24:

Tron Anders Moger, Avdeling for biostatistikk, Universitetet i Oslo, snakket om

Multivariat frailty og hudkreft i svenske kjernefamilier.

Sammendrag: I en standard frailty-modell for korrelerte levetidsdata vil alle individer i hver gruppe ha samme verdi av frailty-variabelen, og dette modellerer korrelasjonen. For f.eks. kjernefamiliedata vil man gjerne også ta hensyn til gener. For å få en realistisk modell må man da la foreldrene være uavhengige i den genetiske frailty-komponenten, mens foreldre og barn, samt barna seg imellom, deler halvparten av genene (men ikke samme halvpart!). Vi utvikler en ny modell basert på Levy-prosesser som er nøstet i hverandre. Modellen benyttes på hudkreftdata fra det svenske multigenerasjonsregisteret, og vi har komponenter for felles og sporadisk miljø, samt genetikk. Jeg viser noen svært foreløpige resultater.

May 10:

Anya Tsalenko, Agilent Technologies Inc, Palo Alto, USA, talked about

Enrichment analysis in biological data sets - minimum hypergeometric statistics.

Abstract: Anya Tsalenko is an Expert Scientist in Agilent Laboratories, working in the area of computational biology. Her research is focused on analysis of gene expression, array CGH and SNP genotype data generated in various complex disease studies. Prior to joining Agilent, Anya was a post-doc at the University of Chicago. Anya has a Ph.D. in Mathematics from Stanford University.

April 12:

Per-Henrik Zahl, Folkehelseinstituttet, snakket om

Brystkreft - Hva har vi lært av mammografiscreeningen?

Sammendrag: Statistiske analyser av kreftdata viser at i) store primærsvulster har oftere dattersvulster enn små primærsvulster, ii) store primærsvulster har dårligere prognose enn små primærsvulster, iii) tilstedeværelse av dattersvulster gir dårlig prognose uavhengig av primærsvulstens størrelse. Slike analyser danner det teoretiske grunnlaget for tidligdiagnostikk; ideen er å behandle pasientene mens primærsvulstene er små og ennå ikke har utviklet dattersvulster.

Det vitenskapelige grunnlaget for å forvente at tidligdiagnostikk av brystkreft ved hjelp av masseundersøkelser med mammografi gir redusert dødelighet er åtte randomiserte studier. En metaanalyse fra det nordiske Cochraneinstituttet konkluderte med at reduksjonen i dødelighet av brystkreft var rundt 15 %, men fordi det ikke var noen reduksjon i studiene med høyest kvalitet kunne det godt hende at den reelle reduksjon var lavere. Metaanalysen viste dessuten at screening medførte overdiagnostikk; påvisning av svulster som i fravær av screening aldri ville blitt diagnostisert.

Ved reell tidligdiagnostikk forventer man økt kreftinsidens hos de innkalte etterfulgt av redusert insidens når kvinnene ikke lenger innkalles. Mammascreeningen i Norge og Sverige førte til ca 50 % insidensøkning i de screenede aldersgrupper. Fordi denne økningen ikke ble etterfulgt av insidensreduksjon når de samme kvinnene fylte 70 år og ikke lenger screenes må det meste av insidensøkningen i de screenede aldersgruppene representere overdiagnostikk (og ikke tidligdiagnostikk). Fordi insidenstallene inkluderer all brystkreft hos de inviterte tilsvarer økningen på 50 % at to av tre svulster som påvises ved mammografi representerer overdiagnostikk.

Våre analyser av data fra det norske screeningprogrammet viser at nesten all overdiagnostikk i mammascreeningen må skyldes lesjoner som normalt ville tilbakedannes spontant slik at de ikke lenger ville være synlige ved en ny mammografi. Dette er en observasjon som rokker ved et sentralt dogme for brystkreft; at nesten alle invasive svulster er monotont voksende. Fra midten av 1800-tallet har man nemlig akseptert at når de histopatologiske kreftkriteriene er oppfylt (ukontrollert celledeling, invasiv vekst) er dette en irreversibel tilstand karakterisert ved monoton og destruktiv vekst.

Med lavt nivå av tidligdiagnostikk og høyt nivå av overdiagnostikk svekkes dessuten grunnlaget for å forvente dødelighetsreduksjon som følge av mammascreening. Det er da heller ikke tegn til dødelighetsreduksjon i nasjonale dødelighetstall fra Sverige og Norge. Landene har hatt samme dødelighetsutviklingen for brystkreft til tross for at Sverige startet organisert mammascreening mer enn ti år før enn Norge, og reduksjonen i dødelighet er like stor i aldersgruppen under 50 år som over.

I debatten om nytte og skade av mammascreening har det til nå vært relativt få deltagere. Vi mener det er viktig at de biostatistiske miljøene ønsker å spille en sentral rolle når effekten av den norske mammografiscreeningen nå skal evalueres.

March 15:

John F. Moxnes, FFI, talked about

The Spanish flu as a worst case scenario?

Abstract: The Spanish flue pandemic (1918-1920) gave a mortality of 0.6 % in the Norwegian population and is currently considered to the worst case scenario for the pandemic plans for the H5N1 bird flu virus. Different worst case scenarios are studied by assuming counterfactual history arguments.

The Spanish Influenza Pandemic was a pandemic caused by an unusually severe and deadly virus of the subtype H1N1 of influenza type A. The virus was unusual in killing many young and healthy victims, unlike common influenzas which kill mostly small children and old. While curves for the influenza death rates with respect to age in annual influenza epidemics are normally U-shaped, with the highest mortality for the very young and the most elderly, during the Spanish flu pandemic a more W-shaped curve was instead observed, with high mortality among young adults, in general those between the age of 20 and 40, but in particular those about the age of 30, who normally would have little to fear from influenza.

The causes of the peculiar shape of the curve for death risk as a function of age during the Spanish flu pandemic are not well understood. However, it is reasonable to speculate that the decline of mortality beyond the age of 30 might be explained as a consequence of immunity from earlier influenza epidemics. The last one of these immunizing epidemics must then be postulated to have occurred about 30 years before 1918, which would be just before the start of the "Russian flu" pandemic that started in 1889. It might be speculated H1 (and perhaps even H1N1) influenza strains might have caused epidemics in human populations during the 1880s, but that these influenza strains were wiped out following the Russian flu pandemic, similarly as H1N1 viruses disappeared in human populations after the Asian flu (H2N2) pandemic in 1957 and H2N2 viruses disappeared in human populations after the Hong Kong flu (H3N2) pandemic in 1968.

We counterfactually assume that individuals were not immunized earlier against the Spanish flu virus. The functionality of the immune system as a function of age was studied for two different scenarios. For the first scenario the functionality is decreasing up to the age of 30 and is constant thereafter. For the second scenario the functionality is monotonic decreasing as a function of age. For the second scenario the mortality of the population becomes 0.3. Thus 30% of the population could have been killed by the Spanish flu virus as a worst case scenario.

March 1:

Morten Fagerland, University of Oslo, talked about

A goodness-of-fit test for multinomial logistic regression.

Abstract: In many fields the logistic regression model is the standard method of analysis for describing the relationship between a nominal scaled outcome variable and one or more predictor variables. Several goodness-of-fit tests exist when the outcome variable is binary. One of the most commonly used test is one first proposed by Hosmer-Lemeshow in 1980 (HL-test). The logistic regression model can be generalized to handle cases when the outcome variable can take on more than two values (the multinomial logistic regression model). Not many alternatives exist to assess the fit of this model. A new test for goodness-of-fit is therefore presented. This test can be considered an extension (or generalization) of the HL-test. The results and setup of a simulation study of the test will be presented. To illustrate the test, a study of cytological criteria for the diagnosis of breast tumors is used.

Feb 22:

Yudi Pawitan, Karolinska Institutet, Stockholm, talked about

Genetic and environmental variance-component estimation from population-based family data.

Abstract: Familial clustering of a certain condition marks the first indications of genetic component. Separation of environmental contributions to the clustering is a key step in establishing the genetic cause. I will present a case study of preeclampsia, a pregnancy-induced hypertensive condition, to illustrate (i) the potentials of family data to answer substantive genetic questions and (ii) the methodological issues in answering those questions, including modelling using generalized linear mixed models and ascertainment problems.

Feb 8:

Stein Atle Lie, University of Bergen, snakket om

Tidlig postoperative dødelighet etter elektiv leddprotese kirurgi.

Sammendrag: Det er en økt tidlig postoperative dødelighet (operasjonsrisiko) etter elektiv protesekirurgi. Denne dødeligheten er vanligvis assosiert med kardiovaskulære hendelser, så som dyp venøs trombose og ischemisk hjertesykdom. Hvor lenge den økede dødeligheten vedvarer, er derimot usikkert. De fleste studier på tidlige postoperative utfall (tromboser, lungeemboli og død) benytter akkumulerte mål, som for eksempel overlevelsessannsynligheter eller andel utfall ved et forhåndsvalg tidspunkt (f.eks 35, 60 eller 90 dager). I denne studien ønsker vi å kvantifisere den økte dødeligheten og hvor lenge denne vedvarer. For å få et tilstrekkelig stort antall observasjoner til å studere den tidlige postoperative dødeligheten, etter innsetting av en total hofte- eller total kne-protese, benyttet vi de to nasjonale registrene for Australia og Norge. Disse registrene dekker mer enn 95 % av alle operasjoner for de to aktuelle landene. Kun pasienter med idiopatisk artrose, mellom 50 og 80 år, ble inkludert i denne studien, totalt 188.104 pasienter. Glattede intensitets kurver ble beregnet for den tidlige postoperative perioden, mens risikofaktorer ble studert med en ikke-parametrisk modell. Dødeligheten var høyest umiddelbart etter operasjonen (~1 død per 10.000 pasient per dag). Dødeligheten avtok inntil den 3dje postoperative uken. Den totale økede tidlige postoperative dødeligheten var 0.13 %. Dødeligheten var, i prinsippet, den samme for begge ledd og for begge land. Det var (ikke overraskende) en økende dødelighet for økende alder, og menn hadde en høyere dødelighet enn kvinner. Muligheten for reduksjon av den tidlige postoperative dødeligheten er størst umiddelbart etter operasjonen, men er mulig innenfor de første 3 ukene.

Jan 11:

Hein Stigum, The Norwegian Institute of Public Health, talked about

Analysis of age of coital debut.

Abstract: The analysis of age of coital debut is central to a description of sexual behaviour. Age of coital debut data typically shares some characteristics that make analysis difficult. For one age is reported in whole years, and furthermore some subjects do not report debut. We also aim to find a regression model that fits the data well and gives simple and interpretable results.

We study age of debut in four cross-sectional surveys (1987 to 2002), a total of 18 000 subjects from the adult Norwegian population. We compare a Cox-model with a linear parametric survival model.

Survival methods are natural tools for analyzing age of coital debut. The debut ages did not follow the proportional hazard model well, and an additive parametric survival model was the better regression model for the Norwegian data set. Furthermore the additive model is easier to interpret. The analysis showed a substantial change in age of debut in the cohorts born 1927 to 1984, with a drop of one year for men, and 2.3 years for women. Women in the oldest cohorts reported their debut 0.8 years later-, in the youngest cohorts 0.5 years earlier, than did men.

A parametric survival model gives results that are easier to interpret, and fits the Norwegian data better than the Cox-model.

### Meetings in the Infectious Disease Modeling Group 2007

Oct 31.

April 16:

Program:

12:00: The upcomming workshop on Influenza modelling. Birgitte Freiesleben de Blasio.

12:30: Talk on Influenza an mortality in the scandinavian countries by Anne Mazick.

13:15: Influenza and mortality in Norway - some additions to the talk from the last meeting. Jon Michael Gran.

March 5:

Program:

12:00: Orientation from meeting in Copenhagen. Preben Aavitsland.

12:45: Influenza and mortality in Norway. Jon Michael Gran.

February 2:

Program:

14:00: Meeting concerning the project *Statistical Modelling for Infectious Disease*

### Seminars autumn 2006

Dec 14:

Thore Egeland, Universitetet i Oslo, snakket om

På genjakt i familier.

Sammendrag: Formålet med prosjektene jeg er involvert i er å finne gener som disponerer for sykdommer. I dette foredraget vil jeg beskrive noen slike prosjekter med vekt på statistiske, metodiske, utfordringer. Ett eksempel er flg: Et par får flere barn med en alvorlig medfødt sykdom. Man har usikker informasjon om slektskap mellom foreldrene. Det er flere interessante metodiske spørsmål i denne forbindelse, blant annet: Hva er konsekvensene av å ignorere denne kunnskapen? Hvordan kan den innarbeides i modellene?

Sep 21:

Karl Halvor Teigen, Universitetet i Oslo, snakket om

Sannsynlighet og usikkerhet i hverdagen.

Sammendrag: KHT gir eksempler fra psykologisk forskning om hvordan folk tenker omkring sannsynligheter, risiko og usikre utfall. Hva ligger til grunn for subjektive anslag av sannsynligheter, og hvordan avviker de fra statistisk/matematiske sannsynligheter?

Sep 7:

Axel Gandy, Department of Biostatistics, University of Oslo, talked av about

Adaptive Sampling for Computing p-Values.

Abstract: In statistics, bootstrap tests and permutation tests are becoming more and more common. In both cases, one needs to evaluate an integral/a sum that typically cannot be computed directly. Instead, one resorts to sampling. The naive approach is to simulate a fixed large number of (bootstrap) samples and compute the relative frequency of exceeding the observed value. This approach may be a waste of computer resources. Often, a decision whether the test yields a significant result or not can be achieved much earlier: the only thing that really needs to be guaranteed is that the computed $p$-value is on the correct side of a given threshold, typically 5%.

The purpose of this talk is to show how this can be done using sequential methods. The suggested approach is suitable for use in statistical software. An experimental R package will be demonstrated. We illustrate this approach with some tests from survival analysis.

### Workshop: Statistics for genome-wide copy number analyses in cancer research

Hotel Crowne Plaza Cabaña, Palo Alto 20-21 September 2006.

Program

### Seminars spring 2006

May 18:

Ole Klungsøyr, Department of Behavioral Sciences, University of Oslo, snakket om

Målefeil i tidsangivelser i Composite International Diagnostic Interview (CIDI).

Sammendrag: CIDI er et epidemiologisk diagnostisk instrument for å undersøke mental helse i befolkningen. Den baserer seg på et 1-2 timers intervju og kartlegger en persons psykiatriske historie og ev. oppfylte diagnostiske kriterier (ICD-10 / DSM IV). Start-tidspunkt (onset) for når diagnosen beregnes også. I følge tverrsnitts underøkelser med slike intrumenter har livstids prevalens av depresjon økt siden begynnelsen av 1900 - tallet. Store longitudinelle undersøkelser over 40 - 50 år viser på den annen side at insidensen er omtrent konstant. Jeg skal vise hvordan fordelingen av målefeil (forskjell mellom sann og beregnet onset) kan estimeres under forutsetning om konstant insidens av depresjon og vil også bruke denne til å korrigere tidligere publiserte resultater om forskjellige kovariaters (f.eks. røyking) effekt på depresjon.

May 11:

Christine Boehm, University of Freiburg, talked about

Estimation of Change in Hospital Stay due to Nosocomial Infections: counterfactual approach to deal with time-dependent confounders.

Abstract: The length of stay in hospital may be influenced by the occurrence of a nosocomial (hospitalacquired). Patients on Intensive Care Units (ICU) are particularly affected by this risk. We are interested in the extra time spent on ICU due to a nosocomial infection. This quantity can be used to evaluate efforts to prevent infection or to assess additional expenses. The estimation must be put up as a problem in time. Patients who stay longer on ICU are longer at risk of getting infected. They also tend to have a worse medical condition which necessitates a longer stay at the outset. We are additionally concerned with status variables which vary over time and might act as timedependent confounders. Such confounders are variables that affect time to discharge and at the same time are associated with the occurrence of a nosocomial infection. The term timedependence signifies that after infection, the confounder is influenced by the nosocomial infection itself.

To quantify the impact of a nosocomial infection on length of stay on ICU, we will use a method proposed by Robins (1992). There, the influence of the nosocomial infection is modelled by an accelerated failure time model. The estimation of the acceleration parameter takes place in a counterfactual setup which includes the timedependent confounder in a way that leads to an unbiased estimate. We will first review Robins' approach and elucidate the different steps of modelling and estimation. In a second step, we will regard the situation above distinguishing the competing endpoints discharge and death. Robins et al. (1992) adress this problem by doing a weighted analysis. In contrast to that approach, we will modify the accelerated failure time model by defining different acceleration factors for patients discharged and patients deceased.

References:

- Lok JJ (2001). Statistical modelling of causal effects in time. Ph. D. thesis, Department of Mathematics, Free University of Amsterdam.

- Robins JM. (1992). Estimation of the timedependent accelerated failure time model in the presence of confounding factors. Biometrika, 79:32134.

- Robins J.M. (1998). Structural nested failure time models. In: Survival Analysis. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics. Chichester, UK: John Wiley and Sons.

- Robins JM, Blevins D, Ritter G, Wulfsohn M. (1992). Gestimation of the effect of prophylaxis therapy for pneumocystis carinii pneumonia on the survival of AIDS patients. Epidemiology, 3:319336.

- Schulgen G and Schumacher M (1996). Estimation of prolongation of hospital stay attributable to nosocomial infections: New approaches based on multistate models. Lifetime Data Analysis; 2:219240.

April 6:

Sven Ove Samuelsen, Division of Statistics, University of Oslo, talked about

Increased head circumference at birth indicates increased risk of childhood brain cancer.

Abstract: Studies have found only weak or no association between birth weight and childhood brain cancer. Previous studies have, however, not focused on the association between head circumference at birth on brain cancer. The study of this association is the objective of this study.

We investigated the association between incidence of brain cancer in childhood and perinatal factors such as head circumference, birth weight and gestational age based on the Norwegian Medical Birth registry from 1978-1998 linked to the Norwegian Cancer Registry from 1978-2002. The study population consists of 1,010,366 subjects with 12,378,172 person-years of observation and among these there were 453 cases of brain cancer in the age group 0-15 years.

The relative risk of brain cancer was 1.27 per centimetre increase in head circumference (95% confidence interval (1.16-1.38) when it was adjusted for birth weight, gestational age and sex.

Head circumference is positively associated with incidence of childhood brain cancer. Our findings strongly indicate that brain pathology originates during foetal life.

March 23:

Ole Christian Lingjærde, Department of Informatics, University of Oslo, talked about

Analysis of array CGH data.

Abstract: Genomic instability is a hallmark of cancer. By going through a series of segmental deletions and duplications of DNA, the genome of the cancer cell may loose a copy of some genes (known as loss of heterozygosity, or LOH while gaining extra copies of others. Although many copy number alteration events probably have fatal consequences for the cell, others lead to increased cell proliferation and may accumulate over time. By use of microarrays, one may obtain genome-wide data on copy number alterations at high resolution. A number of methods has been proposed for detection of genomic aberrations based on such data. I will give a brief survey of existing methods and present work of our own group in this field. I will also briefly present a novel algorithm for detection of copy number alteration hotspots from identified copy number alterations in multiple tumors.

Feb 9:

Thomas Scheike, Department of Biostatistics, Copenhagen University, talked about

Predicting Cumulative Incidence Probability by Direct Binomial Regression.

Abstract: In this paper we suggest a new simple approach for estimation and assessment of covariate effects for the cumulative incidence curve in the competing risk model. The standard approach is to model all cause-specific hazards and then estimate the cumulative incidence curve based on these cause-specific hazards. We introduce a new approach that directly specifies a regression model directly for the cumulative incidence curve. We here consider a semiparametric regression model where some effects may have time-varying effects. One simulation study shows that the estimators work well and one compares the finite sample properties with the sub-distribution approach. We apply the method to a bone marrow transplant data from the International Bone Marrow Transplant Registry (IBMTR) and estimate the cumulative incidence of death in complete remission (TRM) following a bone marrow transplantation. Here TRM and relapse are two competing events.

Jan 26:

Niels Keiding, Department of Biostatistics, Institute of Public Health, Copenhagen University, talked about

Event history analysis and the cross-section.

Abstract: In event history analysis individuals are assumed to move between states, and multi-state models (Andersen et al., 1993, Andersen and Keiding, 2001) are used for their description. In this presentation I survey my interests in event histories developing in calendar time, studied at a cross-section at a particular time. The Lexis diagram (Keiding, 1990, 1991, 2000, Lund, 2000) is helpful here.

I first consider three examples of studying "incidence":

- From the current status at the cross-section (application: rubella incidence based on seroprevalence data). Keiding et al. (1996).

- Retrospectively before the cross-section, invoking Horvitz-Thompson type weights from additional survival information (application: diabetes incidence in Fyn 1933-73 based on prevalent sample in 1973). Keiding et al. (1989), Ogata et al. (2000)

- Retrospectively observed interaction between two life history events, allowing non-symmetric dependence concepts (application: pustulosis palmo-plantaris and menopause). Aalen et al. (1980).

And then three examples of studying "mortality":

- From current duration at the cross-section (application: time to pregnancy). Keiding et al. (1999).

- Prevalent cohort studies (application: survival of diabetics based on follow-up of the prevalent sample from 1973). Keiding (1992).

- Confirmatory analysis of a possible chance finding at an interim analysis of a clinical trial with staggered entry, obtaining by reusing, with delayed entry, the survivors from the interim analysis (application: breast cancer trial). Keiding et al. (1987), Parner and Keiding (2001).

References:

- Aalen, O.O., Borgan, Ø., Keiding, N. & Thormann, J. (1980). Interaction between life history events. Nonparametric analysis for prospective and retrospective data in the presence of censoring. Scand.J.Statist. 7, 161-171.

- Andersen, P.K., Borgan, Ø., Gill, R.D. & Keiding, N. (1993). Statistical Models Based on Counting Processes. New York: Springer, 767 pp.

- Andersen, P.K. & Keiding, N. (2002). Multi-state models for event history analysis. Statist.Meth.Med.Res. 11, 91-115.

- Keiding, N. (1990). Statistical inference in the Lexis diagram. Phil.Trans.Roy.Soc.Land. A 332, 487-509.

- Keiding, N. (1991). Age-specific incidence and prevalence: a statistical perspective (with discussion). J.Roy.Statist.Soc. A 154, 371-412.

- Keiding, N. (1992). Independent delayed entry (with discussion). Survival Analysis: State of the Art (eds. J.P. Klein & P.K. Goel). Kluwer, Dordrecht, 309-326.

- Keiding, N. (2000). Mortality measurement in the 1870s: diagrams, stereograms, and the basic differential equation. http://www.demogr.mpg.de/Papers/workshops/ws000828.htm

- Keiding, N., Bayer, T. & Watt-Boolsen, S. (1987). Confirmatory analysis of survival data using left truncation of the life times of primary survivors. Statist. in Medicine 6, 939-944.

- Keiding, N., Begtrup, K., Scheike, T.H. & Hasibeder, G. (1996). Estimation from current-status data in continuous time. Lifetime Data Analysis 2, 119-129.

- Keiding, N. & Gill, R.D. (1990). Random truncation models and Markov processes. Ann.Statist. 18, 582-602.

- Keiding, N., Holst, C. & Green, A. (1989). Retrospective estimation of diabetes incidence from information in a current prevalent population and historical mortality. Amer.J.Epid. 130, 588-600.

- Keiding, N., Kvist, K., Hartvig, H., Tvede, M. & Juul, S. (2002). Estimating time to pregnancy from current durations in a cross-sectional sample. Biostatistics 3, 565-578.

- Lund, J. (2000). Sampling bias in population studies - how to use the Lexis diagram. Scand.J. Statist. 27, 589-604.

- Ogata, Y., Katsura, K., Keiding, N., Holst, C. & Green, A. (2000). Empirical Bayes age-period-cohort analysis of retrospective incidence data. Scand.J.Statist. 27, 415-432.

- Parner, E.T. & Keiding, N. (2001). Misspecified proportional hazard models and confirmatory analysis of survival data. Biometrika 88, 459-468.

### Microarray seminar

BMMS organized a seminar on Monday May 8, 2006. The theme was Identification of tissue-specific alternative splicing from genome-wide microarray data, and is current research by Anja von Heydebreck, a well-known researcher in the field of statistical methods in microarray-studies.

### Meetings in the Infectious Disease Modeling Group 2006

BMMS, together with the Norwegian Institute of Public Health, have recently started a research group discussing infectious diseases modeling. The field is highly relevant these days, with all the attention to avian flu, but also other types of infectious diseases will be of interest. The short term goal is to establish a joint venue for researchers with mathematical or statistical background working in the field.

November 22:

Program:

14:00: Influenza and mortality in Norway - initial work. Jon Michael Gran.

14:45: Asgeir Lande presented three articles on Influenza and mortality.

Discussion

October 23:

Program:

14:00: Network models. Birgitte Freiesleben de Blasio.

Discussion

September 21:

Program:

14:00: Knut Erik Emberland presented the article "Strategies for mitigating an influenza pandemic" of Ferguson et al.

14:45: Influenza and mortality in Norway. Bjørn Iversen.

Discussion

August 17:

Program:

14:00: Estimating R0 for seasonal influenza in Norway 1998-2005. Jon Michael Gran and Asgeir Lande.

Discussion.

June 20:

Program:

14:00-14.30: Birgitte Freiesleben de Blasio presented the article "Mitigation strategies for pandemic influenza in the United States" of Germann et al.

14.30-15.00: Jon Michael Gran presented the modelling done in the UK Influenza Pandemic Contingency Plan from October 2005.

15.00-15.30: Discussion

May 4:

Program:

14.00-14.20: "Influensa - overordnet bilde", Bjørn Iversen.

14.20-14.40: Odd Aalen presented the article "Strategies for containing an emerging influenza pandemic in Southeast Asia" of Ferguson et al.

14.40-15.00: Hein Stigum presented the article "Containing Pandemic Influenza at the Source" of Longini et al.

15.00-15.30: Discussion

March 31:

Program:

09.00-09.20: Influenza introduction by Bjørn Iversen.

09.20-09.40: Simple modeling of infectious disease in Norway. Hein Stigum.

09.40-10.00: Odd Aalen presented the article "Transmissibility of 1918 pandemic influenza" of C. E. Mills et al.

10.00-10.30: Discussion

### Modern Statistical Methods in Epidemiology

The Centre for Biostatistical Modelling in the Medical Sciences (BMMS) and the statistical research group at the Centre of Advanced Study (CAS) organized a meeting on Monday March 27, 2006. The theme was "Modern Statistical Methods in Epidemiology", introduced by experienced national and international researchers in the field. Several of them are guests at CAS.

See program.

Slides:

Robin Henderson, University of Newcastle: Analysing longitudinal binary data: a case study on prevalence and incidence of infant diarrhoea in Brazil.

Vanessa Didelez, University College London: Mendelian randomisation for causal inference.

Odd O. Aalen, University of Oslo: Some remarks on Cox regression and causality.

Sir David Cox, University of Oxford: A problem in veterinary epidemiology, and its statistical implications.

Arnoldo Frigessi, University of Oslo: New statistical methodologies for genomics.

### Seminars autumn 2005

Nov 17:

Ellen Amundsen talked about

Competing risks of death in a cohort of drug abusers.

Abstract: The aims is to learn about various ways of analyzing competing risks data with Cox regression models. We first investigate the effect of ageing and duration of abuse on fatal overdose and other deaths, controlled for various other possible influential factors. Second, we investigate the role of imprisonment and recent release from prison with respect to overdose death/violent death. The length of the high risk period after release is explored.

The analyses is based on a sample of 501 drug abusers admitted to treatment for drug addiction at the State Clinic for Drug Addicts during the period from 1981 to 1991. A structured interview at admission included demographic data as well as data on the client's abuse of drugs and history of imprisonments. The sample was linked to the National Death Register and the National Register of Convictions using the 11 digit person identification number. Thus date of deaths, cause of death (ICD coded) and emigrations as well as entry to and departure from prison(s) were updated being a jour per 31.12.2003. The sample can thus be seen as a cohort of persons enrolled from treatment in the period 1981 to 1991 and followed to 2003 for time to death or emigration and history of imprisonment.

There are three candidate scales for analysing time to death: time since inclusion in the study, ageing and calendar time. In addition time since release from prison is relevant. Cox regression models are applied, using Lunn & McNeil (1995). STATA is used for the analyses.

References:

- Lunn, M., McNeil, N. (1995): Applying Cox Regression to Competing Risks, Biometrics, 51(2): 524-532.

Nov 3:

Geir Aamodt talked about

Cluster detection.

Abstract: Cluster detection is an important part of spatial epidemiology because it can help identifying environmental factors associated with disease and can help epidemiologists investigate the aetiology of disease. In this article we study three methods suitable for detecting local spatial clusters. The methods are 1) a spatial scan statistic (SatScan), 2) a generalized additive model (GAM) based on smoothing techniques and 3) a model originally developed for Bayesian image restoration (BYM), but which is also applicable for estimating relative risk in disease mapping and to estimate effects in ecological studies. To compare the three models, six different clusters were produced, and with different relative risks between the increased risk area and the normal risk area. A simulation study was conducted and the sensitivity and specificity computed for each case study. The results depend on the relative risk, but in general, all models are suitable for identifying clusters for relative risk larger than 2.5, but it is difficult to detect clusters for lower relative risks. The GAM model showed the highest sensitivity, but relatively low specificity implies an overestimation of the cluster area. Both the BYM model and the SaTScan model works well; the BYM model performing somewhat better than the SaTScan model for large relative risks, but the SatScan performs better for smaller relative risks.

Sept 22:

Hans C. van Houwelingen, Department of Medical Statistics and Bioinformatics, Leiden University Medical Centre, The Netherlands, talked about

Dealing with time-varying effects in clinical trials with survival outcome.

Abstract: Time varying effects of treatment and other predictors in clinical trials with survival as outcome can complicate the analysis of a clinical trial. Both the primary analysis aimed at assessing the treatment effect and the secondary analysis aimed at developing a predictive model for the survival demand a more careful analyis and a more precise definition of the goals of the study.

Two forthcoming papers in Statistics Medicine by my group discuss different aspects on the same data set from a Dutch clinical trial on gastric cancer surgery. The results of these two papers will be presented and be used to illustrate the complications of the analysis of such data.

References:

- H. Putter, M. Sasako, H. H. Hartgrink, C. J. H. van de Velde, J. C. van Houwelingen, Long-term survival with non-proportional hazards: results from the Dutch Gastric Cancer Trial, Statistics in Medicine,Volume 24, Issue 18, Date: 30 September 2005, Pages: 2807-2821.

- Hans C. van Houwelingen, Cornelis J. H. van de Velde, Theo Stijnen, Interim analysis on survival data: its potential bias and how to repair it, Statistics in Medicine, Volume 24, Issue 18, Date: 30 September 2005, Pages: 2823-2835.

### Centre for Biostatistical Modelling in the Medical Sciences meeting

The meeting was Monday, December 5, 2005, at Det Norske Vitenskapsakademi, Drammensveien 78.

Program and list of participants.

### Introduction to infectious disease modelling

The course took place November 24 - 25, 2005, and was arranged as a course for reseachers under the Faculty of Medicine. Christophe Fraser, Imperial College, London, was the main lecturer.

### Workshop on statistical analysis of complex event history data

The workshop took place at the Norwegian Academy of Science and Letters in Oslo, Norway, from August 31st to September 2nd, 2005.

Program and further information.

### Seminars spring 2005

May 12:

Marijke Veenstra fra Rikshospitalet snakket om

April 7:

Ivar Aursnes ogBerit Natvig snakket om

Bayesianske statistiske metoder til analyse av virkninger og bivirkninger av noen legemidler.

March 10:

Stein Emil Vollset, UiB, snakket om

Røyking og tidlig død blant kvinner og menn: 25 års oppfølging av tre fylkesundersøkelser.

April 21:

Ingrid Hobækk Hoff snakket om

Modellering av luftforurensning som funksjon av trafikkvolum og meteorologi.

### Modeling of Infectious Diseases: New Trends and Developments.

February 10-11, 2005, at Domus Medica, the Institute of Basic Medical Sciences, University of Oslo.

### Seminars autumn 2004

**Dec 16:**

**Bettina Kulle** talked about

*Application of genome-wide SNP arrays for detection of simulated susceptibility loci.*

**Abstract:** The prospect of SNP-based genome-wide association analysis has been extensively discussed, but practical experiences remain limited. We performed an association study using a recently developed array of 11555 SNPs distributed throughout the human genome. 104 DNA samples were hybridized to these chips with an average call rate of 97% (range 85.3% - 98.6%). The resulting genome-wide scans were applied to distinguish between carriers and non-carriers of 37 test variants, used as surrogates for monogenic disease traits. The test variants were not contained in the chip and had been determined by other methods. Without adjustment for multiple testing, the procedure detected 24% of the test variants, but the positive predictive value was low (2%). Adjustment for multiple testing eliminated most false-positive associations, but the share of true positive associations decreased to 10-12%. We also simulated fine-mapping of susceptibility loci by restricting testing to the immediate neighbourhood of test variants (+/- 5 Mb). This increased the proportion of correctly identified test variants to 22-27%. Simulation of a bigenic inheritance reduced the sensitivity to 1%. Similarly adverse effect had reduction of allelic penetrance. In summary, we demonstrate the feasibility and considerable specificity of SNP array-based association studies to detect variants underlying monogenic, highly penetrant traits. The outcome is affected by allelic frequencies of chip SNPs, by the ratio between simulated "cases" and "controls", and by the degree of linkage disequilibrium. A major improvement is expected from raising the density of the SNP array.

Dec 2:

Nils Lid Hjort talked about

Fokuserte seleksjonskriterier og modellmiksing for Cox' hazardregresjonsmodell.

(Joint work with Gerda Claeskens, Katholieke Universiteit Leuven)

Abstract: The talk is concerned with variable selection methods for the proportional hazards regression model. Including too many covariates causes extra variability and inflated confidence intervals for regression parameters, so regimes for discarding the less informative ones are needed. Our framework has p covariates designated as `protected' while variables from a further set of q covariates are examined for possible in- or exclusion. In addition to deriving results for the AIC method, defined via the partial likelihood, we develop a focussed information criterion that for given interest parameter finds the best subset of covariates. Thus the FIC might find that the best model for predicting median survival time might be different from the best model for estimating survival probabilities, and the best overall model for analysing survival for men might not be the same as the best overall model for analysing survival of women. We also develop methodology for model averaging, where the final estimate of a quantity is a weighted average of estimates computed for a range of submodels. Our methods are illustrated in simulations and for a survival study of Danish skin cancer patients.

Key words: Akaike's information criterion, covariate selection, Cox regression, focussed information criteria, median survival time, model averaging

Link to Hjort's Cox-model article: http://www.econ.kuleuven.ac.be/public/ndbaf45/publicationsGC.html

Nov 11:

Daniel Ganiola talked about

Inferring Fixed Effects in Mixed Linear Models from Integrated Likelihoods.

Abstract: A new method for likelihood-based inference of fixed effects in mixed linear models, with variance components treated as nuisance parameters is presented. The method uses uniform-integration of the likelihood; the implementation employs the EM algorithm for elimination of all nuisances, viewing random effects and variance components as missing data. In a simulation of a grazing trial, the procedure was compared with four widely used estimators of fixed effects in mixed models, and found to be competitive. An analysis of body weight in freshwater crayfish was conducted to illustrate the feasibility of the methodology in a real situation. The method is a useful non-Bayesian alternative to maximum likelihood and estimated generalized least-squares, as it accounts for nuisance variances.

Nov 4:

Anne-Lise Børresen-Dale and Therese Sørlie talked about

Identification of genomewide expression profiles related to tumor type, stage, aggressiveness and therapy response.

Abstract: We are using DNA microarrays to study patterns of gene expression in clinical breast cancer samples. Breast cancer is a heterogeneous disease presenting a diverse range of biologically characteristics of the tumors as well as different responses to treatment and clinically outcomes for the patients. The DNA microarray technology allows us to study expression of many genes in concert and to explore complex interplays between genes that otherwise would be unrecognized by standard methods. By using this technology in combination with statistical and bioinformatical tools, we have been able to identify specific patterns of gene expression that provided distinctive molecular portraits of breast tumors. Different sets of co-expressed genes have given clues to specific pathways that may be involved in these cancers as well as the cellular composition of the tumors. By using hierarchical clustering of such gene expression data, we have identified five subtypes of breast tumors characterized by specific expression patterns. These subtypes were significantly associated with different outcomes for the patients. Systematic investigation of genetic variation, both inherited and somatically altered genes, gene expression patterns and genome wide copy number alterations in tumors and their correlation to specific features of phenotypic variations will provide the basis for an improved molecular taxonomy. When hundreds of tumors have been systematically characterized, a better tumor classification is likely to appear, and statistically significant relationships with different clinical parameters may be uncovered. Recognizing the expression "motifs" that represent important clinical phenotypes, like resistance or sensitivity to specific therapies, invasiveness, or metastatic potential, is an important challenge in our ongoing studies.

**Oct 7:**

**Petter Mostad**talked about

Using bioinformatics to find gene batteries in mammalian genomes.

Abstract: It is an important challenge to understand how the regulation and differential expression of a few tens of thousands of genes can result in the complex differentiation of cell types in a growing complex organism. A suggested paradigm, originally proposed as early as 1934, is that genes are organized into gene batteries, i.e., functionally related genes activated by the same regulatory mechanism. The genes in a gene battery would then have similar expression patterns across tissue types, developmental stages, and environmental conditions, and would have similar regulatory motifs in the DNA sequence close to the genes.

Computational methods based on gene expression data and analysis of regulatory motifs have previously yielded plausible gene batteries in simple organisms like yeast. I will present an approach where human and mouse gene expression profiles are combined with regulatory motif analysis, using simple statistical tools, to detect gene batteries in these organisms.

This is joint work with Sven Nelander, Erik Larsson and Per Lidahl at Göteborg University, and Erik Kristiansson and Olle Nerman at Chalmers University.

### The 4th Lysebu-meeting of Norevent (2004): *Causality - a statistical viewpoint*

The Lysebu meeting took place September 7.-8. 2004 (see program). Transparencies from several of the talks of the meeting (including two tutorials) are available here.

### Nyere statistiske metoder i epidemiologi

Course at the Cancer Registry of Norway, May 18, 2004.

Program

### Course on Multivariate Survival Analysis

29. - 31. March, 2004 at Domus Medica, University of Oslo with lecturer **Philip Hougaard**, Copenhagen.

Program

### Seminars spring 2004

**June 17:**

**Arne Kolstad** and **Oddbjørn Haga** snakket om

*Anvendelse av forløpsdata i Rikstrygdeverket.*

Sammendrag:Vår rolle i utredningsavdelingen ved RTV er blant annet å levere prognoser til planlegging og budsjettanslag, samt å vurdere effekten av planlagte og iverksatte tiltak og regelendringer. Vi skal presentere to arbeidsmåter:

1) En planleggingsmodell for alderspensjon og helserelaterte langtidsytelser i Folketrygden.

2) Evaluering av tiltak: Effekten av a) aktiv sykemelding og b) skjerping av attføringsvilkåret i forbindelse med søknad om uførepensjon.

Oddbjørn Haga vil presentere planleggingsmodellen som bruker mikrosimulering basert på stykkevis konstante hasardrater. Arne Kolstad vil presentere evalueringene som bruker nøstet case-control, tellemodeller, panelmodeller og cox-regresjon.

**June 10:**

**Egil Ferkingstad**snakket om

Multippel hypotesetesting: False discovery rates og estimering av andelen sanne nullhypoteser. Anvendelse på DNA microarray-data.

Sammendrag: Estimering av andelen sanne nullhypoteser, p0, er interessant i situasjoner der vi utfører et stort antall hypotesetester. Flere slike situasjoner har dukket opp i anvendelser, og motivasjonen for dette arbeidet har vært å estimere andelen gener som ikke er forskjellig uttrykt i DNA microarray-eksperimenter. I tillegg til å være en interessant størrelse i seg selv, er estimatet for andelen sanne nullhypoteser en viktig input til False Discovery Rate-metodikk. Jeg skal kort presentere idéene bak False Discovery Rates, og deretter gjennomgå ulike metoder for estimering av p0. Alle disse metodene er basert på de observerte p-verdiene. Vi har foreslått nye estimatorer basert på estimering av p-verdi-tettheten, blant annet en estimator som baserer seg på antagelsen om at p-verdi-tettheten er konveks og avtakende. Oppførselen til de ulike estimatorene er undersøkt ved hjelp av simuleringsstudier med ulik grad av avhengighet mellom p-verdiene.

**April 29:**

**Odd O. Aalen** talked about

Casual inference and counting processes: Understanding the counterfactual approach.

Abstract: Counting process theory constitutes the mathematial foundation of survival and event history analysis. The recent developments of counterfactual approaches in causal inference (Robins and others) have been largely independent of existing theory but is still closely related to established approaches of counting process theory. I shall elaborate on this connection. In fact, all approaches for handling censoring (e.g. the Kaplan-Meier curve) are really counterfactual in the language of causal inference. The famous G-computation formula of Robins can be seen as an extension of the Kaplan-Meier curve via more general transition probability estimates for Markov chains.

This is work in progress and I present preliminary results.

**April 15:**

**Tron Anders Moger** snakket om

Analyse av spedbarnsdødlighet i søskenflokker ved hjelp av frailty-modeller.

Sammendrag: Flere studier viser en forhøyet risiko for nye tilfeller av spedbarnsdød i familier hvor det har vært ett tilfelle. Man kan derfor tenke seg at det er en stor grad av heterogenitet i risiko i familiene, hvor noen familier har en høy risiko, mens de fleste har en svært lav risiko for spedbarnsdødlighet. Dette er klassisk frailty-tankegang. Vi ønsker derfor å analysere data på spedbarnsdødlighet i søskenflokker ved hjelp av ulike frailty-modeller. Viktige poenger blir å undersøke hvor god tilpasning de ulike modellene gir til dataene, og å estimere korrelasjonen i dødelighet.

Ved kontakt med Medisinsk Fødselsregister, har vi fått tilgang til data på spedbarn koblet sammen i søskenflokker. Dødsfall innen første leveår blir registrert. Man kan tenke seg at en stor del av disse dødsfallene skyldes svakheter av genetisk eller miljømessig art. Disse dataene vil bli analysert ved hjelp av ulike frailty-modeller.

Frailty modelleres vanligvis ved hjelp av en multiplikativ modell, der hasarden for hvert individ er gitt som et produkt av en frailtyvariabel, som er spesifikk for hvert individ, og en underliggende hasard som er felles for alle individer. Ved analyse av familiedata, må man i tillegg ta hensyn til at individer i slekt kan være korrelerte med hensyn på sykdommer, dødelighet etc. Den vanlige måten å løse dette på, er å bruke en shared frailty modell, hvor individer som er i slekt har en felles verdi av frailtyvariabelen. Dette skaper korrelasjon mellom slektninger. Vanlige fordelinger for frailtyvariabelen er gamma og PVF-fordelingene. Sistnevnte er en klasse av fordelinger, hvor blant andre gamma-, invers Gaussisk og stablefordelingen inngår som spesialtilfeller. En viktig svakhet ved shared frailty-modeller er at individer som er i slekt har den samme risikoen. Dette kan være lite hensiktsmessig, siden søsken kun deler halvparten av genene og det vil derfor være individuell variasjon i tillegg. En bedre modell kan være å randomisere en skalaparameter i PVF-fordelingene ved hjelp av en ny PVF-fordeling, for dermed å tillate både felles og individuell frailty. Dette gir en vesentlig bedre tilpasning til data.

**March 25:**

**Torbjørn Wisløff** talked about

Random variations of NNT.

Abstract: The reciprocal of absolute risk reduction (ARR) is called number-needed-to-treat (NNT) and is usually interpreted as the number of patients needed to be treated to observe one less adverse event. Even if ARR (and hence NNT) were known with certainty, the number of adverse events, will however vary randomly. The aim of this analysis was to explore the distribution of unsuccessful outcomes when varying certain parameters.

We simulated therapies using binominal distributions based on different ARR, n and the baseline risk of adverse events (BR) as parameters. We estimated the observed number of events in the intervention group compared to the control group and displayed the distributions for various parameter values. Also, we investigated the probability of no or a negative effect of treatment, given a known underlying effect.

**March 15:**

**Egil Ferkingstad**snakket om

*Kausal Inferens.*

Sammendrag: Tradisjonelt har statistikere vært svært skeptiske til alt snakk om kausalitet. Likevel, i statistisk praksis er det ofte årsakssammenhenger man egentlig er interessert i. Innsikt i årsakssammenhenger er interessant bl.a. når det gjelder å forstå mekanismer og forutsi effekter av intervensjoner. I dette foredraget vil jeg kort presentere ulike innfallvinkler til kausal inferens.

**March 11:**

**Ivar Heuch**snakket om

*Statistiske metoder for evolusjonstrær - Klarer statistikerne**å holde tritt med behovet for nye dataanalytiske metoder?*

Sammendrag: Det foreligger i dag molekylære datasett som kan kaste lys over vesentlige spørsmål når det gjelder slektskapet mellom systematiske grupper av forskjellig rang i biologien. Dataene er basert på basesekvenser for DNA eller aminosyresekvenser for proteiner. Optimalt sett bør de tilhørende statistiske metodene lede til pålitelige estimater for evolusjonstrær som beskriver slektskapet mellom de aktuelle gruppene. Metodene som faktisk blir benyttet, er basert på forskjellig grunnleggende tankegang, som slett ikke alltid stemmer med vanlig filosofi for statistiske modeller. Likevel er f.eks. bootstrapping en mye brukt teknikk. Foredraget vil ta opp de utfordringene som statistikerne stilles overfor med slike uvanlige problemstillinger.

**February 12:**

**Hans Julius Skaug** talked about

*A language and a program for fitting nonlinear random effects models by maximum likelihood.*

Abstract: Nonlinear hierarchical models have gained widespread use in statistics. Examples include random effects models, survival analysis with frailties and measurement error models. The statistician's toolbox for fitting such models consists of several specialized software packages. In a fully Bayesian world, where priors are placed on all parameters, the program BUGS provide a very flexible framework for formulating and fitting hierarchical models. I will present a system for fitting hierarchical models by maximum likelihood. The system is similar to BUGS with respect to flexibility, but a different computational technology is used. While BUGS uses MCMC (Markov Chain Monte Carlo), our system uses the Laplace approximation for integrating out random effects of the joint likelihood. The underlying computational engine is a technique from computer science called `automatic differentiation'. I will show some examples involving real data sets, and contrast the results with those of other software packages, including BUGS. This is joint work with David Fournier (Canada).

**February 4:**

**Solve Sæbø** talked about

*A Genetic and Spatial Bayesian Analysis of Mastitis Resistance.*

Abstract: A nationwide health card recording system for dairy cattle was introduced in Norway in 1975 (the Norwegian Cattle Health Services). The data base holds information on mastitis occurrences on an individual cow basis. A reduction in mastitis frequency across the population is desired, and for this purpose risk factors are investigated. In this paper a Bayesian proportional hazards model is used for modelling time to first veterinary treatment of clinical mastitis, including both genetic and environmental covariates. Sire effects are modelled as shared random components, and veterinary district is included as an environmental effect with prior spatial smoothing. A non-informative smoothing prior is assumed for the baseline hazard, and Markov Chain Monte Carlo methods (MCMC) are used for inference. We propose a new measure of quality for sires, in terms of their posterior probability of being among the, say 10\% best sires. The probability is a easily interpretable measure that can be directly used to rank sires. Estimating these complex probabilities is straightforward in an MCMC setting. The results indicate considerable differences between sires with regard to their daughters disease resistance. A regional effect was also discovered with lowest risk of disease in the south-eastern parts of Norway .

**Overlevelse, kliniske tidsdata og mikromatriser **

Torsdag 22. januar 2004 kl. 9.00 - 15.30 i Lille Auditorium, Domus Medica.

Formålet med møtet å gi en plattform for framtidig samarbeid mellom medisinske og biologiske forskningsmiljøer i Oslo-området som arbeider med mikromatrisedata og kliniske tidsdata og vår gruppe av statistikere. En nærmere orientering om bakgrunnen for og formålet med møtet er gitt nedenfor.

På møtet vil vi presentere "state of the art" når det gjelder statistiske metoder for å analyse sammenhengen mellom mikromatrisedata og kliniske tidsdata, mens noen av deltagerne fra de medisinske og biologiske forskningsmiljøene vil presentere sine problemstillinger og forskningsaktivitet på området.

**Program:**

09.00-09.15: Åpning.

09.15-09.55: Magne Aldrin, Ørnulf Borgan og Arnoldo Frigessi:

Microarray data as predictors in survival analysis: Background, state of the art and statistical challenges. Part I.

09.55-10.25: Kaffepause.

10.25-11.25: Magne Aldrin, Ørnulf Borgan og Arnoldo Frigessi:

Microarray data as predictors in survival analysis: Background, state of the art and statistical challenges. Part II.

11.30-12.30: Lunsj.

12.30-13.00: Eivind Hovig: Associations between gene expressions in breast cancer and patient survival.

13.00-14.00: Therese Sørlie og Anne-Lise Børresen-Dale:

Challenges of combining large scale genomic data with clinical outcome in breast cancer,

14.00-14.15 Pause.

14.15-14.45: Ola Myklebost:

Global analysis of genetic perturbations important for the properties of connective tissue tumours.

14.45-15.30: Diskusjon og avslutning.

Møtet er støttet av Norevent.

Vennlig hilsen

Magne Aldrin, Norsk Regnesentral.

Ørnulf Borgan, Matematisk Institutt, UiO.

Thore Egeland, Rikshospitalet.

Arnoldo Frigessi, Seksjon for Medisinsk Statistikk, UiO.

Ingrid Glad, Matematisk Institutt, UiO.

Marit Holden, Norsk Regnesentral.

Knut Liestøl, Institutt for Informatikk, UiO.

Ole Christian Lindgjerde, Institutt for Informatikk, UiO.

Nærmere om bakgrunnen for og formålet med møtet

Mikromatriser kan gi informasjon om ekspresjonen til tusenvis av gener. En vil ofte være interessert i å knytte denne informasjonen til overlevelse eller andre kliniske tidsdata, som for eksempel tid fra kreftdiagnose til metastaser. Et viktig spørsmål er da om genekspresjonsdata kan gi oss bedre levetidsprognoser -- og dermed et bedre grunnlag for å treffe behandlingsvalg - enn det tradisjonelle kliniske variabler alene kan gi.

Det er på ingen måte enkelt å analysere sammenhengen mellom genekspresjonsdata og kliniske tidsdata. Det skyldes særlig to forhold:

(i) Vanlige regresjonsmetodene er utviklet for å håndtere situasjoner der en har mange flere individer enn forklaringsvariabler. Her er det motsatt. Vi har typisk informasjon om tusenvis av gener, men høyst et hundretalls pasienter.

(ii) Kliniske tidsdata er vanligvis ufullstendig observert. For flere pasienter vil en for eksempel bare vite at de enda ikke har fått metastaser.

En riktig håndtering av disse problemene er ikke et spørsmål om "statistiske detaljer", men noe som er helt avgjørende når en ønsker å finne gener som kan bidra til å predikere overlevelse.

Innen rammen av tradisjonelle lineære modeller er det utviklet flere metoder for å håndtere problem (i). Det fins også en velutviklet statistisk metodikk for analyse av tidsdata som håndterer problem (ii). Ved å kombinere metoder fra de to feltene, er det mulig å analysere sammenhengen mellom genekspresjonsdataene og kliniske tidsdata på en tilfredsstillende måte.

Vår gruppe av statistikere er bredt sammensatt og består både av personer med kunnskaper i statistisk genomikk og personer med forskningskompetanse i problemområdene (i) og (ii). Vi er derfor godt rustet til å utføre konkrete analyser av sammenhengen mellom genekspresjonsdata og kliniske tidsdata. Vi ønsker å opprette samarbeid med ledende medisinske og biologiske forskningsmiljøer som arbeider med slike data. Formålet med møtet er å ta de første skritt for å opprette et slikt samarbeid.

### Seminars autumn 2003

**Nov 27:**

**Birgitte Freiesleben de Blaiso** talked about

*Structure and dynamics of sexual networks.*

Abstract: The structure and dynamics of sexual contact networks is of fundamental interest for understanding the transmission dynamics of sexually-transmitted diseases. One important and robust result from surveys of sexual behaviour concerns the numbers of sex partners individuals have during a given time, their so-called "degree". It is found that the degree distribution is highly skewed, approximately obeying a power-law for a part of its range. Given this observation, particular interest is devoted to the statistical properties of the tail of the degree distribution, i.e. the small number of individuals having an unusually large number of sex partners.

In my talk degree distribution data from Norwegian surveys are presented and I will address problems connected with specifying the best model for the empirical data and limitations of inference from such data. Further, I will try to compare sexual networks with other types of social networks. Results from graph theoretical models will be included, which seek to explain how skewed degree distributions may arise. Possible implications of this network structure for intervention strategies are discussed.

** **

**Nov 13:**

**Hans van Houwelingen,** Dept. of Medical Statistics and Bio-informatics, Leiden University Medical Center, talked about

*Cox-regression with micro-array data.*

Abstract: It is hoped that gene-expression information as collected from micro-array data can be used in modeling (event free) survival for cancer patients. The high dimension p of the set of explanatory variables as compared to the number of patients n lead to seemingly new problems in statistical modeling with *p >> n*.

However, such problems are not really new. Ridge estimators and penalized likelihood has been introduced in similar settings (with *p≈n*) a long time ago [1] . Cross-validation is used for selecting the optimal weight of the penalty. In Cox-regression with censored data and partial likelihood, cross-validation is less straight-forward. The method of [2] is one possible approach. We will discuss cross-validated penalized partial likelihood with an application on breast cancer survival data [3].

The Bayesian point of view gives an alternative interpretation of penalized likelihood [4]. We will discuss how the empirical Bayesian approach can be used to obtain a global test for the presence of any effect of the large number of explanatory variables on the survival outcome in the spirit of [5] using results of [6]. Finally, the Bayesian approach allows for a smooth switch between different penalty function models.

References:

[1] le Cessie S, van Houwelingen JC, Ridge Estimators in Logistic Regression, Appl. Statist. 41, 191-201, 1992.

[2] Verweij PJM, van Houwelingen HC Cross-validation in survival analysis, Statistics in Medicine, 12, 2305-2314, 1993.

[3] van de Vijver MJ, He YD, van 't Veer LJ, et al., A gene-expression signature as a predictor of survival in breast cancer. NEW ENGL J MED 347, 1999-2009, 2002.

[4] van Houwelingen JC, Shrinkage and penalized likelihood as methods to improve predictive accuracy, Statistica Neerlandica, 55, 17-34, 2001.

[5] Goeman JJ, van de Geer SA, de Kort F, van Houwelingen HC, A global test for groups of genes: testing association with a clinical outcome, Bioinformatics, in press, 2004.

[6] Verweij PJM., van Houwelingen HC, Stijnen Th., A goodness-of-fit test for Cox's proportional hazards model based on martingale residuals. Biometrics, 54, 1517-1526, 1998.

**Oct 23:**

**Joseph Sexton** snakket om

*Analyse av en smittsom magesyke utbrudd ved sommerleir.*

Sammendrag: Beskriver her en analyse av data fra et utbrudd av en smittsom magesykdom ved en sommerleir. Leiren varte i 10 dager, og ved slutten av 'epidemien' hadde omtrent 65% av ca 300 deltagerne opplevd symptomer. Spredningen av sykdommen var fra person-til-person, samt fra en eller flere "miljøkilder" og et hovedspørsmål er hvilke (for eksempel forurenset drikkevann).

Generelt har analyse av data på spredning av infeksjonssykdommer mye til felles med overlevelsesanalyse. Man er interessert i en hendelse (her infeksjon), og tiden det tar til den inntreffer er en naturlig størrelse å modellere. To aspekter ved infeksjonsdata gjør at de ofte skiller seg noe fra 'mer vanlig' overlevelsesdata:

(i) Ofte observeres ikke infeksjonstidspunktet direkte men indirekte via for eksempel tiden ved utvikling av symptomer.

(ii) Det er betydelig avhengighet mellom individer når infeksjonen spres fra person til person. Dette kan for eksempel gi utslag i at uobserverte størrelser som antall smittsomme individer et gitt individ er i kontakt med inngår i dennes hasard for infeksjon.

I analysen av magesyke utbruddet gjøres et forsøk på å ta hensyn til disse to punktene.

**Oct 9:**

**Axel Gandy**, University of Ulm, Germany, talked about

*Software Reliability and a Goodness of Fit test for Aalen's Additive Risk Model.*

Abstract: Aalen's nonparametric additive risk model model is well known in the field of biostatistics. It assumes that a multi-dimensional counting process has intensity

λ(t) = Y (t) α(t),

where Y (t) are observable covariates and α(t) are unknown regression functions. The model has not been used in the field of software reliability (where failures of software are being observed), mainly due to the lack of suitable large datasets. This talk presents a large dataset from open source software in which bug reports constitute the failures and analyzes it with the Aalen model. One covariate used in the analysis is a dynamic covariate, i.e. it depends on the past of the process. To assess the goodness of fit of the model we suggest formal tests based on an idea similar to martingale residuals. We give the asymptotic result necessary and discuss the performance of these tests using simulation studies. We also apply the tests to datasets not from Software Reliability.

**Sept 18:**

**Solve Sæbø**, stipendiat ved Seksjon for Statistikk ved Norges Landbrukshøyskole, snakket om:

*Modellering av tid til utrangering av ku (slakting) som første passeringstider for tidsakselererte Wienerprosesser.*

Sammendrag: Risikoen for at en ku utrangeres vil øke dersom den er sykdomsutsatt. En modell basert på første passeringstider for en Wienerprosess der tiden akselereres for hvert sykdomstilfelle kan kanskje være en måte å tilnærme seg dette problemet på. Som kjent er modellering av tidsavhengige kovariater ikke bare "straight forward" mht til tolkninger etc. Utrangeringsbeslutningen er også basert på mange andre faktorer som også må tas hensyn til i en analyse.

**Sept 4:**

**Odd O. Aalen**snakket om

*Kausalitet og statistikk.*

Sammendrag: Statistikkfaget har et paradoksalt forhold til kausalitet. På den ene siden er det en del av "barnelærdommen" helst ikke å snakke om kausalitet, og slett ikke blande sammen statistisk assosiajon og kausalitet. På den annen side, når man anvender statistikk i for eksempel medisin, så oppdager man at det nettopp er muligheten for å avsløre kausalitet som gjør statistikken interessant for medisinerne. Vi driver altså med kausalitet hele tiden, men det er tabu å snakke om det.

De senere årene har det dukket opp diverse skoleretninger som sier at dette er tull. Selvfølgelig skal statistikere snakke om kausalitet. Og en rekke forskjellige skoleretninger er dukket opp: de grafiske modeller, "counterfactual" kausalitet, Granger-kausalitet. Det siste er et fundamentalt begrep fra økonometrien som nå også kommer i den medisinske verden. Oppfunnet av Granger 1969, og uavhengig av nordmannen Tore Schweder i 1970 under betegnelsen lokal uavhengighet.

Kausalitet har selvsagt også sterke filosofiske undertoner. Hvor kausal er den verden vi lever i? Kvantemekanikerne sier at den iallfall ikke er kausal på kvantenivå, der den rene tilfeldighet råder. Og det er også en eldgammel konflikt mellom kausaliteten og den frie vilje.

I det hele tatt, et vanskelig men uomgjengelig begrep.

### Norevent Research Kitchen (November 5-14)

"Norevent Research Kitchen" organized by Nils Lid Hjort, with participants Ian McKeague, Ingrid Van Keilegom and Gerda Claeskens took place during 5.-14. november 2003. The following themes were discussed:

- Empirical likelihood for survival analysis models;

- The partly parameteric, partly nonparametric additive hazard regression model;

- Goodness of fit;

- Model selection and model averaging.

### The 3rd Lysebu-meeting of Norevent

Norevent held its 3rd Lysebu-meeting on Monday the 8th of September. The meeting had 50 participants. The meeting, which included a nice lunch, was free of charge for all the participants.

Organizers: Odd Aalen, Ørnulf Borgan, Harald Fekjær, Arnoldo Frigessi, Tron Anders Moger og Ida Scheel.

Topic:

*Statistical challenges from genetics.*

Program:

(The complete program can be downloaded here)* *

08.30-09.00: Registration.

Chairman: *Odd* *Aalen.*

09.00-09.25: Arnoldo Frigessi: Some issues in statistical modelling of cDNA microarrays.

09.25-09.50: Marit Holden: *Estimation of absolute mRNA concentrations from cDNA microarrays.*

09.50-10.15: Mette Langaas: *Statistical analysis of DNA microarray data: experimental design, linear mixed effects models and multiple testing.*

10.15-10.40: Coffee/tea.

Chairman: *Ørnulf Borgan.*

10.40-11.25: Peter Donnelly: *Statistical inference in molecular population genetics.*

11.25-12.10: Juni Palmgren: *Statistics and mapping of genes for complex traits.* Abstract.

12.10-13.10: Lunch.

13.10-13.35: Ole Christian Lingjærde:*Handling many covariates in proportional hazard regression applied to microarray survival data.*

Chairman: *Arnoldo Frigessi*

13.35-14.00: Hege Edvardsen: *Challenges of genetic diversity and the use of high throughput genotyping in genetic epidemiology.* Abstract.

14.00-14.25: Thore Egeland: *Statistical methods in forensic genetics: New challenges.*

14.25-14.50: Coffee/Tea.

14.50-15.15: Rolv Terje Lie: *Estimation of genetic effects and gene-environment interaction from case-parent triad data.*

15.15-15.40: Håkon Gjessing: *Log-linear models for case-parent triad data with multiple alleles and haplotype information.*

15.40-16.00: Summing up. Discussion.

### Seminars spring 2003:

**June 12:**

Ellen Amundsen, Statens Institutt for Rusmiddelforskning, snakket om

Rekruttering til, omfang av og opphør av narkotikamisbruk over tid.

Sammendrag: Det er vanskelig å tallmessig beskrive rekruttering til, omfang av og opphør av narkotikamisbruk over tid. Vanlige utvalgsundersøkelser gir svært dårlige mål og data må derfor hentes fra en rekke andre kilder med til dels store skjevheter og stor usikkerhet. Det finnes imidlertid enkeltstudier og estimater som kan bidra med elementer i en helhetlig beskrivelse. Utvikling av en stokastisk modell som fanger hele prosessen over kan derfor trolig bidra til å lage et bedre samlet bilde enn de enkeltstående undersøkelser hver for seg.

Utfordringen består i å lage en statistisk modell som på en tilstrekkelig virkelighetsnær måte beskriver rekruttering til, varighet av og opphør av narkotikamisbruk. Modellen baseres på tilstander som beskriver at personer prøver ut illegale stoffer, at de blir hektet i misbruk, at de er i behandlingstiltak og at misbruket kan opphøre i perioder eller avsluttes på ulike måter. Opphold i fengsel er svært vanlig for personer med narkotikaproblemer og en slik tilstand kan derfor også integreres. Opphold i eller overgang mellom tilstandene vil delvis være kjent gjennom eksisterende studier og datakilder. Ukjente størrelser kan delvis beregnes i modellen eller de må antas kjent. Effekter av antagelser må studeres via sensitivitetsstudier. En nyttig tilnærming er også å studere hvilke typer av nye data som vil redusere usikkerheten i estimeringer mest mulig. Et utgangspunkt for modellkonstruksjon er en Markov modell selv om vi ikke vet ennå om betingelsene er oppfylt. Startpunkt for tidsaksen er tenkt lagt til 1990 hvor det ble gjort en studie av omfang av alvorlig narkotikamisbruk.

De norske estimater for omfang av alvorlig rusmisbruk (heroinbruk med sprøyte) blir i dag beregnet ut i fra overdosedødsfall ved en såkalt multiplikatormetode. Andre forslag i europeisk regi har vært "back calculation" basert på tid fra start av misbruk til opptak i behandling, multiple indicator method (bruk av andre indikatorer som samvarierer geografisk med alvorlig narkotikabruk), andre multiplikatormetoder eller capture-recapture teknikker.

**May 20: **

Gunnar Andersson, Max Planck Institute for Demographic Research (Rostock, Tyskland), talked about

Childbearing after migration: The fertility behavior of foreign-born women in Sweden.

Abstract: The present study provides an investigation of patterns in childbearing among foreign-born women in Sweden during recent decades. Event-history techniques are applied to longitudinal population-register data on childbearing and migration of 446.000 foreign-born women who had ever lived in Sweden before the end of 1999. Period trends in parity-specific fertility appear to be quite similar for Swedish- and foreign-born women but important differences exist in levels of childbearing intensities between women stemming from different countries. Most immigrant groups tend to display higher levels of childbearing shortly after immigration. We conclude that migration and family building in many cases are interrelated processes and that it is always important to account for time since migration when fertility of immigrants is studied. In our next step, we proceed with a deeper investigation of various determinants of entry to motherhood for various groups of foreign-born women and focuses on the impact of their labour-market attachment. The effects of having an earned income are evident, with increased income levels increasing the probability of becoming a mother for all observed nationalities. The effects of the various states of participation and non-participation in the labour force do not seem to vary greatly between immigrants and natives. Among all subgroups, we find a higher propensity to begin childbearing among those who are established in the labour market. Contrary to popular belief the effects of welfare recipience are clearly negative for immigrants but not for natives. The similarity in patterns across widely different national groups gives support to the notion that various institutional factors affecting all subgroups in society are crucial in influencing childbearing behaviour within a country.

For further readings:

**May 8: **

Aparna Huzurbazar, associate professor ved Department of Mathematics and Statistics, University of New Mexico, talked about

Flowgraph Models for Multistate Time to Event Data.

Abstract: Flowgraph models are useful in a variety of survival analysis problems. Previously, they have been used to model progression of diseases such as cancer and AIDS, and for degenerative diseases such as diabetic retinopathy, kidney failure, and dementia. They have also been used in systems engineering for modelling cellular telephone networks. They are especially useful for analyzing time to event data and constructing corresponding Bayes predictive distributions, survivor functions, and hazard functions. Flowgraph models are general multistate stochastic models that, when combined with saddlepoint approximations, allow a wide variety of parametric time-to-event modelling. They analyze semi-Markov processes using data on outcomes, probabilities of outcomes, and waiting times for outcomes to occur. They are useful for constructing likelihoods for incomplete data and useful in situations where data are unrecognizably incomplete. Recently, methodology has been developed that puts flowgraphs models into the counting processes framework. I will discuss inference based on flowgraph models using real data applications, the relationship of flowgraphs to counting processes, and time permitting, current applications arising through my work at RAND.

**April 3: **

David Spiegelhalter, fra Medical Research Council Biostatistics Unit, Cambridge, er et av de fremste navn i medisinsk statistikk og kanskje den fremste bayesianeren på feltet, bl.a. kjent som grunnleggeren av BUGS. Han holdt to foredrag torsdag 3. april om følgende temaer:

Talk 1: Bayesian approaches to evaluating health care interventions.

Abstract: Bayesian methods are starting to find practical applications in biostatistics, and this talk will focus on real examples which illustrate some of its potential benefits. First, the formal expression of ´scepticism´ by a prior distribution can be useful both in interpreting published results of studies, and in monitoring a clinical trial by a Data Monitoring Committee. Second, hierarchical models allow flexible extensions of standard random-effects modelling to encourage, for example, prediction of future trial results, and investigating the relationship between treatment effect and baseline risk. Finally, generalised evidence synthesis´ can be used to combine results from related studies that inform different aspects of a model of interest and, if appropriate, feed the results into a cost-effectiveness analysis.

Talk 2: Computation for Bayesian analysis in biostatistics.

Abstract: Although basic Bayesian analysis can be carried out algebraically, most realistic analyses require Markov chain Monte Carlo (MCMC) methods. We illustrate how the WinBUGS software can be used to implement the examples discussed in the first talk.

**March 20: **

Florin Vaida, Dept. of Biostatistics, Harvard (slides here), talked about

Cox Proportional Models with Random Effects.

Abstract: We consider inference in the Cox proportional hazards model for clustered data where some of the terms in the log relative risk are random, designating various patterns of variability across clusters. This generalizes the usual frailty model [Nielsen et. al., 1992] by allowing a multivariate random effect with arbitrary design matrix, which in turn provides a wider scope for the interpretation of the random effects that is consistent with the linear mixed models [Laird and Ware, 1982] and the generalized linear mixed models [McCulloch, 1997].

As opposed to the additive frailty model [Petersen, 1998], the random effects act additively on the log relative risk, and

hence multiplicatively on the hazard function. The distribution of the random effects is generally assumed to be multivariate normal, but other specifications are also considered. The inference is based on the EM algorithm, where the M-step separates the estimation of the regression parameter and baseline hazard from the estimation of the variance components. The E-step involves computation of posterior expectations of functions of the random effects; we show how these may be computed by numerical integration or MCMC methods. The variances of the parameters are estimated by Louis' formula [Louis, 1982], and predictors of the individual random effects are obtained as a by-product of the estimation.

The inference procedure is exemplified on data from a lung cancer trial conducted by the Eastern Cooperative Oncology Group. Issues of model diagnostic, selection and interpretation will also be addressed.

References:

- Laird N and Ware J (1982). "Random effects models for longitudinal data". Biometrics 38, 963-974.

- Louis, TA (1982). "Finding the observed information matrix when using the EM algorithm". JRSS B, 44, 2, 190-200.

- McCulloch CE (1997). "Maximum likelihood algorithms for generalized linear mixed models". Journal of the American Statistical Association, 92, 437, 162-170.

- Nielsen G, Gill RD, Andersen PK, and Soerensen TIA (1992). "A counting process approach to maximum likelihood estimation in frailty models". Scandinavian Journal of Statistics 19, 25-44.

- Petersen, JH (1998). "An additive frailty model for correlated life times". Biometrics 54, 646-661.

**Feb 28:**

Junbai Wang, postdoktorstipendiat i bioinformatikk ved Radiumhospitalet, talked about

Tumor Classification and gene prediction by SOM component plan.

Abstract: There is a continuous development of DNA Microarray technology and the application of Microarray in biology or medical researches. It has been shown that DNA Microarray experiments are a potential useful tool in drug target discovery, tumor sample classifications and studying the gene regulatory networks or pathways. Here, we will present one application by Microarray technology, tumor sample classification and target gene discovery. The presentation will be based on a published lymphoma data set (Alizadeh et. al., 2000), where Microarray expression profiles had been harvested from 42 diffuse large B-cell lymphoma (DLBCL) samples, 9 follicular lymphoma (FL) samples and 11 chronic lymphocytic leukaemia (CLL) tumor samples. Self-organizing map (SOM) and K-means clustering had been used to classify the tumor samples and study the gene expression structures of each sample. The DLBCL samples had been regrouped according to the gene expression structures shown by SOM component plan. The patient's survival time in each subgroup were analyzed by Kaplan-Meier plots and the log rank test. Potential interesting genes, which may highly correlate with patient's survival rate, were revealed by this study.

**Feb 13:**

Johan Fosen snakket om

Bruk av dynamisk kovariat for å modellere frailty.

Sammendrag: Vi antar at en person kan oppleve mange hendelser, og lar telleprosessen N

_{i}(t) være antall inntrufne hendelser for person i ved tid t. Hasardraten for hver person antas å være en lineær funksjon av to stokastisk uavhengige variable som er gitte verdier ved prosessens start. Den ene variabelen er kjent mens den andre er ukjent og kan betraktes som frailty. Simulerte data fra denne situasjonen vil vi modellere med Aalens additive hasardmodell, der vi i tillegg til den kjente variabelen vil bruke en dynamisk kovariat som representerer antall inntrufne hendelser. På møtet vil bl.a flg. problemstillinger tatt opp: hvordan man kan tilpasse den dynamiske kovariaten slik at den kun representerer den ukjente variabelen, og hvordan fungerer ulike variansestimatorer? (Dette er en del av et prosjekt i samarbeid med Odd Aalen, Harald Fekjær og Ørnulf Borgan).

**Jan 23:**

Harald Fekjær snakket om

Hvordan kan screeningdata si oss noe om veksthastigheten til brystkreftsvulster?

### Course in advanced survival and event history analysis

2. - 4. December, 2002, Room 1128, Domus Medica, University of Oslo.

Teachers and organizers: Prof. Ørnulf Borgan, Assoc. prof. Sven Ove Samuelsen, Researcher Harald Fekjær, Prof. Odd Aalen (chairman)

Norevent arranged a course which gave an introduction to modern statistical methods in survival and event history analysis. Some basic material was introduced, including Kaplan-Meier survival curves and Cox regression analysis. The following issues was then disscussed:

- Time-dependent covariate effects

- Frailty (individual heterogeneity)

- Analysis of data with repeated events

- Interval censoring

- Design aspects (case/cohort, case/control)

- Modeling event histories

- The additive model

The statistical program package S-PLUS was used throughout the course, and a short introduction to the free "S-clone" R was given.

There was no textbook for the course. Below is mentioned a couple of books of interest as background material.

- W. N. Venables and B. D. Ripley Modern Applied Statistics with S. Fourth Edition, Springer, 2002. (http://www.stats.ox.ac.uk/pub/MASS4/).

- T. M. Therneau and P. M. Grambsch: Modeling Survival Data: Extending the Cox Model, Springer, 2000.

The full program for the course can be found here.

**Excercises
Monday:**

The excercise on basic survival analysis can be found here. Solution to the excercise can be found here. (The data are available here for use at home.)

The excercise on Cox regression can be found here. Solution to the excercise can be found here. (The data are available here for use at home.)

**Tuesday:**

The exercise on time dependent effects in Cox-regression can be found here. Solution to the excercise can be found here. (The data are available here for use at home.)

The excercise on clusted survival data and repeated events can be found here. (The retinopathy data are available here, and the bladder data are available here, for use at home.)

**Wednesday:**

The excercise on case-cohort designs can be found here. (The data are available here for use at home.)

### Seminars autumn 2002:

Fra 26. september 2002 kan slidene fra foredragene være tilgjengelige i PDF-format. Acrobat reader for lesing av filene kan lastes ned gratis.

**Dec 19: **

Øystein Kravdal, Økonomisk institutt UiO og Kreftregisteret, snakket om

Sammendrag: Øystein Kravdal skal presentere en analyse av hvordan sosioøkonomiske ressurser og det å ha familie og barn påvirker overlevelsen av kreft i Norge. Oppmerksomheten er rettet mot relativ overlevelse. Dette innebærer at dødeligheten blant mennesker med en kreftdiagnose sammenliknes med dødeligheten blant dem som ikke har slik diagnose, den såkalte normaldødeligheten. For noen krefttyper er det viktig å ta hensyn til at normaldødeligheten avhenger av sosodemografiske variable i tillegg til alder og kjønn. Dette kan lett gjøres i en blandet additiv-multiplikativ dødsratemodell estimert for et materiale der noen har kreftdiagnose og noen ikke har. Analysen er basert på register- og folketellingsdata for hele den norske befolkningen. I presentasjonen vil det bli lagt langt mer vekt på resultater og fortolkninger enn de metodiske aspektene.

**Nov 14:**

**Oct 17: **

Sven Ove Samuelsen, avd. C, Matematisk Institutt, snakket om

Tilskrivbare andeler på sensurerte levetidsdata: Noen forslag til definisjoner.

**Sammendrag:** Tilskrivbare andeler er et viktig begrep i epidemiologi og for kliniske forsøk. Begrepet har hovedsakelig blitt utviklet for kasus-kontroll studier, tverrsnittsstudier og kohort studier der oppfølgingstiden er like lang for alle individer. På dette seminaret skal jeg diskutere noen mulige definisjoner når oppfølgingstiden varierer og spesielt se på høyresensurerte levetidsdata. Sammenheng og forskjeller mellom de ulike definisjonene blir diskutert ved dataeksempler, simuleringer og noen rent matematiske betraktninger. Dette erforeløpige resultater fra et samarbeid med Geir Egil Eide ved Kompetansesenter for klinisk forskning, Haukeland universitetssykehus.

Sept 26:

Rosalba Rosato, Avdeling for Statistikk, Universitetet i Firenze, talked about

Talk I: An attempt to apply the additive regression model to patients with heart failure for the evaluation of a community-based management strategy.

Abstract: Heart failure is characterized by a very poor prognosis (2 year of median survival), due to severe symptoms, and consequently a very high number of hospitalisations after diagnosis and high costs for in-hospital care. Recently, in view of cutting hospitalisation costs and increasing quality of life of these terminally-ill patients, the possibility of an at-home assistance with the help of a specialized nurse, has been explored.

Thirty-five patients has been followed up for 2 years. After the first year, when they were followed at Heart Failure Clinic of Trieste (Italy), they have been switched to an at-home assistance. The basis idea was that no worsening in hospital readmission probabilities occurred during the second period as compared with the previous one. Using the additive regression model we have estimate the transition probabilities to have hospitalisation before and after the introduction of at-home assistance, according to some patient's covariate.

Slider til foredraget i PDF-format.

Talk II: Doctoral thesis project.

Abstract: Using the additive regression model to evaluate risk factors for recurrent hospitalisations in a cohort of diabetic patients. We have a cohort of 3934 type 2 diabetic patients followed for a maximum period of 4.5 years. When the study started some clinical information for each patient have been reported. The aim of the study is to estimate the probability to have one or more hospitalisations classified into 2 different groups: admissions related to diabetes, and other admissions. Hospital discharge records related to the period from 1^{st} January 1996 to 30^{th} June 2000 have searched for each patient of the sample. Among 3934 patients, 2138 (54%) have had at least one admission during the period considered: total discharges are 5532, around 2.6 admission per patient with a range from 1 to 26. Using the additive regression model we'll try to estimate the covariates time dependent effect on hospitalisation or death.

Slider til foredraget i PDF-format

Aug 29:

Christian Nicolay Brinch, NR, snakket om

Forløpsanalyse i økonometri og ikke-parametrisk identifiserbarhet av modeller med uobservert heterogenitet (frailty).

Sammendrag: Jeg vil presentere deler av arbeidet fra avhandlingen min, men ønsker å bruke en del tid på motivasjon av arbeidet gjennom å presentere de sentrale problemstillingene innen den økonometriske tradisjonen for forløpsanalyse. Den økonometriske tradisjonen avviker her en del fra resten av den statistiske litteraturen, både i hva slags teoretiske emner som blir vektlagt og i hvordan empiriske analyser gjennomføres. Jeg vil blant annet diskutere hvorfor beviser av ikke-parametrisk identifiserbarhet av modeller har fått en viktig rolle innen økonometrisk litteratur, og gå gjennom noen hovedresultater på dette området før jeg presenterer mine egne resultater. Mine egne resultater kan kort og upresist oppsummeres som: Dersom man har tilgang til kovariater som varierer over tid innen forløp, så er det ikke nødvendig å anta proporsjonale hazarder for å oppnå ikke-parametrisk identifiserbarhet av "Mixed Hazards"-modeller (modeller der frailty inngår som en faktor i hazardraten.)

### Workshop: Methods in Infectious Disease Epidemiology

11. september. Arrangert av GLOBINF, i samarbeid med Norevent.

GLOBINF er tematisk forskningsområde ved Medisinsk Fakultet på forebygging av viktige globale infeksjonssykdommer.

### The 2nd Lysebu-meeting of Norevent, Mai 13th 2002

Norevent arrangerte sitt 2. Lysebu-møte 13. mai 2002 (det første ble holdt i januar 2001). Som tidligere møter i Norevent ble møtet holdt i en uformell stil. Det var 45 deltagere på møtet. Møtet, inklusive en god lunsj, var gratis for alle deltagere. Møtet ble arrangert av Norevents arbeidsutvalg (Odd Aalen, Ørnulf Borgan, Harald Fekjær og Tron Anders Moger).

Program:

Vi hadde denne gang to utenlandske gjester, Terry Therneau og Per Kragh Andersen:

*Terry Therneau*, Mayo Clinic, USA

Therneau er vel kjent som hovedansvarlig for Survival-pakken i S-Plus. Han er også medforfatter av den kjente boken *Modelling Survival Data* som kom ut ganske nylig. Therneau vil holde to forelesninger, over de følgende temaer:

- Experiences with a correlated frailty Cox model.

- Connections between the Cox model and Aalen's additive hazard regression.

*Per Kragh Andersen*, Institutt for biostatistikk, Københavns Universitet, er kjent som en ekspert i forløpsanalyse og i epidemiologiske metoder. Han vil holde forelesning over følgende tema:

- Event history analysis via generalized models for pseudo-observations.

I tillegg var det en avdeling med lokale foredrag:

*Jan Terje Kvaløy:* Tests for the proportional hazards assumption based on the score process.

*Ørnulf Borgan:* Estimimation of covariate-dependent Markov transition probabilities from nested case-control data.

*Ane Seierstad og Harald Fekjær:* Analysis of divorce patterns among Norwegian homosexuals.

*Jan F. Nygård:* The use of screening data to estimate the progression rate of pre-cancerous lesions for the uterine cervix.

### Seminars spring 2002:

**June 6: **

Hans Steinsland, stipendiat på Senter for internasjonal helse ved Universitetet i Bergen, talked about

Molecular epidemiological studies of childhood diarrheal disease - with special focus on infections with enterotoxigenic *escherichia coli.*

**Abstract**: Diarrhea is a major cause of childhood deaths in developing countries, killing more than 2 million children yearly, and it is a major cause of child ill health and malnutrition. To contribute to the understanding of the epidemiology of infections that cause diarrhea, we carried out a large epidemiological study of childhood diarrheal diseases in Guinea-Bissau, West Africa. A cohort of 200 children were followed from birth to up to two years of age with weekly stool specimen collection and detailed microbiological examination. With emphasis on one of the most important diarrheal pathogens, enterotoxigenic *Escherichia coli* (ETEC), we present some of the most important findings from the study so far and some of the metodology that were used to analyze the data. In particular, we describe in detail the method we used for assessing the potential protective immunity of ETEC vaccines by estimating the protection induced from natural ETEC infections.

Ellen Amundsen, Sirus og Seksjon for medisinsk statistikk, Odd Aalen, Seksjon for medisinsk statistikk, og Hein Stigum, Folkehelseinstituttet, snakket om

Litt om vekstrater for infeksjonssykdommer og "reproductive rates" (R, R0).

Ellen Amundsen: Estimering av lokal R.

Hein Stigum: Sammenlikning av reproduksjonsratio (R) og lokal R fra en simuleringsmodell.

Odd Aalen: R0, lokale vekstrater og egenverdier i matriser.

John-Arne Røttingen: Diskusjon.

Ca. 10 minutter på hvert innlegg.

April 4:

Solve Sæbø snakket om

Overlevelse og genetikk.

Sammendrag: Bruk av genetisk informasjon har i mange år vært utnyttet i lineære modeller innen husdyrforskningen. Prediksjon av avlsverdier for husdyr har vært et viktig grunnlag for seleksjon av avlsdyr med hensyn til genetisk "framgang" relatert til visse egenskaper, f.eks melkeevne hos kyr, kjøttkvalitet på gris m.m. Genetisk informasjon i form av slektskapsinformasjon kan inkluderes i analysen for å gi bedre prediksjoner av avlsverdiene. I de senere år har slik genetisk informasjon også blitt inkludert i overlevelses-analysemodeller, først og fremst i proporsjonal- hazard modeller.

Jeg vil innledningsvis beskrive det genetiske aspektet ved analysene. Deretter vil jeg komme inn på estimerings/prediksjonsmetodene som er brukt. Dette vil bli illustrert i form av et dataeksempel der både en Cox-type modell og en first-passage-time modell for wienerprosesser har blitt brukt.

Tadek Bednarski, Institute of Mathematics of University of Zielona Gora and of the Polish Academy of Sciences, talked about

Robust inference for the Cox model.

1415-1545 i B81 NHA. Foredraget ble samkjørt med seminarserien i medisinsk statistikk. Link til abstract.

Jan 17:

Det første møtet i Norevent 2002 gikk av stabelen torsdag 17. januar klokken 1415-16 i møterommet til seksjonen her på preklinisk. Programmet var:

Tron A. Moger snakket om

Referanser:

### Seminars autumn 2001:

Dec 13:

Johan Fosen snakket om

Analyse av søvnlaboratoriedata.

Sammendrag: Med utgangspunkt i et søvnlaboratoriedatasett vil jeg se på hvordan overgangen mellom våken tilstand og søvn kan modelleres ved hjelp av bl.a dynamiske kovariater som "antall ganger man har vært våken tidligere samme natt". Aalens additive hasardmodell med innebygget ridge-regresjon benyttes. I forbindelse med presentasjonen av analysen og resultatene vil jeg forsøke å komme inn på hvordan sistnevnte ev. påvirkes av valg av ridge-faktor og glattings-parameter. Dette arbeidet er i startfasen.

Nov 8:

Eva Skovlund snakket om

Oct 18:

Ellen Amundsen snakket om

Odd O. Aalen snakket om

Analyse av multivariate levetidsdata.

Sammendrag: Standardmetodene i overlevelsesanalyse forutsetter at man har høyst en "event" per individ. Hva hvis det er flere begivenheter? Dette støter man ofte på, og standardmåten å takle det på er via frailty-teori. Jeg skal vise at slike data ofte kan analyseres via additiv regresjon, som kan gi et mer detaljert og fleksibelt bilde enn frailty-metoden. Dette er "work in progress" der flere medlemmer av Norevent er involvert (Johan, Harald og Ørnulf).

### Workshop: New developments in event history analysis

In November 2001 Norevent arranged a one day pre-conference course and a three day workshop in event history analysis and frailty models open to researchers and Ph.D. students from the Nordic countries. The workshop was a great success, there were in total 29 participants on the pre-course and 59 participants on the workshop. Around half of the participants were from Norway. The invited lecturers on the workshop were:

- Robin Henderson, Department of Mathematics and Statistics, Lancaster University, England.
- Nils Lid Hjort, Department of Mathematics, University of Oslo, Norway.
- Philip Hougaard, Novo Nordisk, Copenhagen, Denmark.
- Judith Lok, Leiden University Medical Center, The Netherlands.
- Glen A. Satten, Centers for Disease Control and Prevention, Atlanta, USA.
- Thomas Scheike, Department of Mathematical Sciences, University of Aalborg, Denmark.

Many of the regular members of Norevent also contributed with lectures.