Data Fusion based on Coupled Matrix and Tensor Factorizations
Speaker: Evrim Acar, Senior Research Scientist, Simula Research Laboratory, Norway
When the goal is to discover the underlying patterns in a complex system such as the human metabolism or brain, the complexity of the problem requires the collection and analysis of data from multiple sources. Therefore, data fusion, i.e., joint analysis of data sets from multiple sources, is a topic of interest in many fields. For instance, in metabolomics, analytical platforms such as Liquid Chromatography - Mass Spectrometry and Nuclear Magnetic Resonance Spectroscopy are used for chemical profiling of biological samples. Measurements from different platforms are capable of detecting different chemical compounds with different levels of sensitivity, and their fusion has the potential to provide a more complete picture of the metabolome related to a specific condition. Similarly, neuroimaging modalities such as functional Magnetic Resonance Imaging (fMRI) and electroencephalography (EEG) provide information about the brain function in complementary spatio-temporal resolutions, and their joint analysis is expected to provide better understanding of brain activities. However, data fusion remains a challenging task since there is a lack of data mining tools that can jointly analyze incomplete (i.e., with missing entries) heterogeneous (i.e., in the form of higher-order tensors and matrices) data sets, and capture the underlying shared and unshared patterns. We formulate data fusion as a coupled matrix and tensor factorization (CMTF) problem and discuss its extension to structure-revealing data fusion, i.e., fusion models that can identify shared and unshared factors in coupled data sets. Numerical experiments on real coupled data sets demonstrate that while traditional methods based on matrix factorizations have limitations in terms of jointly analyzing heterogeneous data sets, the structure-revealing CMTF model can successfully capture the underlying patterns by exploiting the low-rank structure of higher-order tensors. We will show the broad applicability of CMTF-based fusion models with applications in metabolomics, neuroscience and recommender systems.