Sparse clustering of functional data

Speaker: Valeria Vitelli, postdoc, Department of Biostatistics, University of Oslo.


When faced with a clustering problem, we do not expect the real underlying groups to differ in all features considered. Most likely only a limited number of variables are relevant to detect differences among groups. Moreover, this problem further complicates as much as the number of features p is larger than the sample size N. For this reason many statistical methods, denominated "sparse", have been proposed, aimed at clustering data while jointly selecting the most relevant features for clustering. Aim of our work is to consider the same problem in a functional data context, where sparsity means selecting subsets of the domain where the clusters distinguish the most. First, the sparse multivariate clustering problem, formulated via a hard thresholding strategy, is proven to have unique solution. Then, the problem of classifying functional data and jointly selecting relevant data features is analytically defined as a constrained optimization of a functional, over the set of possible cluster partition and over a set of admissible weighting functions, responsible for feature selection. An implementing algorithm is also proposed. Improvements on standard functional clustering techniques are discussed in the light of simulation studies. Finally, an application to the Berkeley Growth Study is also presented.

Published Sep. 23, 2013 9:54 AM - Last modified Dec. 5, 2013 1:11 PM