Statistical models for high-dimensional and functional data
The research group aims at developing statistical methods for analyzing data with complex structure, high-dimensional and/or intrinsically smooth (functional).
Various examples of high-dimensional data and analyses.
About the group
The rapid evolution of technological devices able to measure increasingly complex and high-dimensional data (large omics databases, high-frequency time series, 3D images generated by medical scanners) poses new unprecedented challenges to data analysis. Proper data understanding and modeling is key to reaching meaningful conclusions, targeted at improving and personalizing clinical treatment. Accounting for such complexity requires the development of novel statistical models and computational methods, fueling a fascinating and fast-growing area of research in statistics.
Methods for high-dimensional and functional data have been flourishing over the last decades, and our aim is to contribute to the development in this field. We work on:
- scalable Bayesian inference for molecular signature discovery
- data integration for cancer genomics
- rank-based statistical models for preference learning
- sparse models for functional data clustering
The frontiers of research in statistics are nowadays defined by the challenges that data complexity poses to the statistician: every day we push the boundary of what and how we are able to measure information in an unprecedented way in human history. However, data are a powerful source of information only when modeled properly, such that the analysis manages to estimate structural effects from noise and incompleteness. Statistical methods provide the basic tools to extract knowledge from data, to quantify the associated uncertainty, and to predict or make decisions accounting for such uncertainty.
In doing so, we face a number of challenges:
- data incompleteness and fragmentation: missing and fragmented data are a reality, especially when dealing with complex measurement tools, and therefore propagating the associated uncertainty (for example via Bayesian methods) is key to reaching meaningful and reliable conclusions;
- complex/high-dimensional/smooth process generating the data: realizing that the data generating process has some intrinsic structure or regularity can be an advantage for better understanding, and therefore we try to exploit this opportunity by combining advanced mathematical tools (functional spaces, network structures) to achieve better modeling;
- massive database dimension: since massive amounts of information are often redundant, we need to develop sparse statistical models capable of capturing and summarizing the relevant insight from the data;
- heterogeneity of the data sources: data collected from several platforms or devices (such as in genomics) give the opportunity to increase power by analyzing them jointly, and data fusion approaches require to account for the resulting heterogeneity in the measurements quality and scales across sources.
The group cooperates with both national and international researchers:
- Several ongoing collaborations within OCBE (IMB, UiO) and at the statistics section of the Mathematics Department (UiO).
- Preference learning collaborations: Antonio D’Ambrosio, Department of Economic and Statistics, University of Naples Federico II (Italy); Cristina Mollica and Luca Tardella, Department of Statistical Sciences, Sapienza University of Rome (Italy).
- Integrative Genomics collaborations: Paul Kirk and Sylvia Richardson, MRC Biostatistics Unit, Cambridge (UK).
- Functional Data Analysis (FDA) collaborations: Piercesare Secchi, Simone Vantini, Laura Sangalli at the statistics section of MOX, Politecnico di Milano (Italy).