Norwegian version of this page

Statistical models for high-dimensional and functional data

The research group aims at developing statistical methods for analyzing data with complex structure, high-dimensional and/or intrinsically smooth (functional). 

Image may contain: Colorfulness, Text, Line, Plot.

Various examples of high-dimensional data and analyses.

About the group

The rapid evolution of technological devices able to measure increasingly complex and high-dimensional data (large omics databases, high-frequency time series, 3D images generated by medical scanners) poses new unprecedented challenges to data analysis. Proper data understanding and modeling is key to reaching meaningful conclusions, targeted at improving and personalizing clinical treatment. Accounting for such complexity requires the development of novel statistical models and computational methods, fueling a fascinating and fast-growing area of research in statistics.
Methods for high-dimensional and functional data have been flourishing over the last decades, and our aim is to contribute to the development in this field. We work on:

  • scalable Bayesian inference for molecular signature discovery
  • data integration for cancer genomics
  • rank-based statistical models for preference learning
  • sparse models for functional data clustering

Challenges

The frontiers of research in statistics are nowadays defined by the challenges that data complexity poses to the statistician: every day we push the boundary of what and how we are able to measure information in an unprecedented way in human history. However, data are a powerful source of information only when modeled properly, such that the analysis manages to estimate structural effects from noise and incompleteness. Statistical methods provide the basic tools to extract knowledge from data, to quantify the associated uncertainty, and to predict or make decisions accounting for such uncertainty.
In doing so, we face a number of challenges:

  • data incompleteness and fragmentation: missing and fragmented data are a reality, especially when dealing with complex measurement tools, and therefore propagating the associated uncertainty (for example via Bayesian methods) is key to reaching meaningful and reliable conclusions;
  • complex/high-dimensional/smooth process generating the data: realizing that the data generating process has some intrinsic structure or regularity can be an advantage for better understanding, and therefore we try to exploit this opportunity by combining advanced mathematical tools (functional spaces, network structures) to achieve better modeling;
  • massive database dimension: since massive amounts of information are often redundant, we need to develop sparse statistical models capable of capturing and summarizing the relevant insight from the data;
  • heterogeneity of the data sources: data collected from several platforms or devices (such as in genomics) give the opportunity to increase power by analyzing them jointly, and data fusion approaches require to account for the resulting heterogeneity in the measurements quality and scales across sources.

Cooperation

The group cooperates with both national and international researchers:

Published Oct. 28, 2020 11:21 AM - Last modified Dec. 6, 2021 2:44 PM

Contact

Dept. of Biostatistics
Domus Medica, Gaustad
Sognsvannsveien 9
0372 Oslo

Group Leader

Valeria Vitelli