2. Dimension reduction for extremes
Faculty: P. Naveau
Participants: R. Wills, V. Otieno, M. Castella Sanchez, T. Andry Arivelo, M. Bador
Climate science resorts to spatial statistics for predicting future changes, detecting large time scales, or modeling unobserved zones and times (broadly termed climate reconstruction, using a variety of, often ad-hoc, imputation techniques). The amount of data involved is so large that it becomes a statistical problem. Indeed, in the presence of very large datasets, the estimation of parametric models, the prediction at unobserved sites and the associated uncertainty estimation may not be computationally feasible. Consequently, one of the main objectives of statistical climatology is to extract relevant information hidden in complex spatial-temporal climatological datasets. To identify spatial patterns, most well-known statistical techniques in climate studies are based on the concept of variance, like the k-means clustering algorithm, or the Empirical Orthogonal Function (EOF) analysis that decomposes estimated variance-covariance matrices. This makes sense for applications that aim at identifying patterns with respect to mean behaviors. In particular, it is ideally suited when the variable of interest follows a mixture of normal distributions because Gaussian random vectors are fully characterized by their mean vectors and their covariances matrix. A possible avenue to bridge this methodological gap resides in taking advantage of multivariate EVT and to adapt it to the context of dimension reduction. The problem of dimension reduction is challenging here, since multivariate EVT is by nature non-parametric (unlike Gaussian modeling through correlation matrix), and most applications of non-parametric multivariate EVT have dealt with very low dimensions (less than 10).