15-17 May 2024 Montpellier (France)

Abstracts

Alexis Boulin (Université Côte d'Azur)

Estimating regularly varying random vectors with discrete exponent measure via model-based clustering

This study introduces a novel estimation method for the entries and structure of a matrix A in the linear factor model X = AZ + E. This is applied to an observable vector X in R^d with Z in R^K, a vector composed of independently regularly varying random variables, and light-tailed independent noise E in R^d. X is hence regularly varying and its exponent measure is subsequently discrete and completely characterized by the matrix A. Each row of the matrix A is supposed to be both scaled and sparse. Additionally, the value of K is not known a priori. The problem of identifying the matrix A from its matrix of pairwise extremal correlation is addressed. In the presence of pure variables, which are elements of X linked, through A, to a single latent factor, the matrix A can be reconstructed from the extremal correlation matrix. Our proofs of identifiability are constructive and pave the way for our innovative estimation for determining the number of factors K and the matrix A from n weakly dependent observations on X.

**********

Rémi Boutin (Sorbonne Université)

The Deep Latent Position Topic Model for Clustering and Representation of Networks with Textual Edges

Slides Rémi Boutin

Numerical interactions leading to users sharing textual content published by others are naturally represented by a network where the individuals are associated with the nodes and the exchanged texts with the edges. To understand those heterogeneous and complex data structures, clustering nodes into homogeneous groups as well as rendering a comprehensible visualisation of the data is mandatory. To address both issues, we introduce Deep-LPTM, a model-based clustering strategy relying on a variational graph auto-encoder approach as well as a probabilistic model to characterise the topics of discussion. Deep-LPTM allows to build a joint representation of the nodes and of the edges in two embeddings spaces. The parameters are inferred using a variational inference algorithm. We also introduce IC2L, a model selection criterion specifically designed to choose models with relevant clustering and visualisation properties. An extensive benchmark study on synthetic data is provided. In particular, we find that Deep-LPTM better recovers the partitions of the nodes than the state-of-the art ETSBM and STBM. Eventually, the emails of the Enron company are analysed and visualisations of the results are presented, with meaningful highlights of the graph structure.

**********

Gloria Buriticá (University of Geneva)

Block methods for extremal index inference of heavy-tailed time series

Slides Gloria Buriticá

Extreme quantile inference is essential for risk assessment in hydrological applications where reported values are used to design prevention plans against natural hazards such as floods or landslides. Daily precipitation records frequently reach unprecedented levels, and this suggests modeling them using heavy-tailed distributions. Due to temporal dependencies, the extremes of heavy-tailed time series can cluster and appear as several extreme observations over short periods. This can affect inference methods tailored for independent data. In the presence of temporal dependencies, we often need to correct extreme quantile predictions by estimating the so-called extremal index.

In a nutshell, cluster statistics summarize the clustering behavior of short extreme periods; for example, we can obtain the extremal index in this way. In this talk, we study block methods of cluster statistics inference. We present the asymptotic properties of block estimators for cluster inference based on consecutive observations with extremal lp norms, for p > 0. The case p = α, where α > 0 is the tail index of the heavy-tailed series, presents advantageous properties; thus, we analyze the asymptotic properties of block estimators when approximating α using Hill's estimator. We illustrate our results with simulations and calculate the extremal index for daily autumn precipitation records from weather stations in France.

**********

Lucas Butsch (Karlsruhe Institue of Technology)

Information criteria for the number of directions of extremes in high-dimensional data

In multivariate extreme value analysis, estimating the extremal dependence structure is a challenging task, especially in the context of high-dimensional data. Therefore, a common approach is to reduce the dimensionality by considering only the directions in which extreme values occur. Typically, the underlying models are assumed to be multivariate regularly varying, which under mild assumptions is equivalent to sparse regularly varying, recently introduced by Meyer and Wintenberger (2021). Sparse regular variation has the advantage of capturing the sparsity structure in which extreme events occur better than multivariate regular variation. Therefore, in this talk, we use the concept of sparse regular variation to present different information criteria for the number of directions in which extreme events occur, such as a Bayesian information criterion (BIC), a mean-squared error based information criterion (MUSIC) and a quasi-Akaike information criterion (QAIC) based on the Gaussian likelihood function. A result is that the AIC of Meyer and Wintenberger (2023) and the MUSIC are inconsistent information criteria whereas the BIC and the QAIC are consistent information criteria. Finally, the performance of the different information criteria is compared in a simulation study.

**********

Valérie Chavez-Demoulin (University of Lausanne)

Causal Discovery in Multivariate Extremes

Slides Valérie Chavez-Demoulin

Causal asymmetry is the result of the principle that an event is a cause only if its absence would not have been a cause. From there, uncovering causal discovery becomes a matter of comparing a well-defined score in both directions. Motivated by studying causal effects at extreme levels of a random vector, we propose to construct a model-agnostic causal score relying solely on the assumption of the existence of a max-domain of attraction. Based on a representation of a Generalized Pareto random vector, we construct the causal score as the Wasserstein distance between the margins and a well-specified random variable. The proposed methodology is illustrated on a hydrologically simulated dataset of different characteristics of catchments in Switzerland: discharge, precipitation and snow melt. Join work with Linda Mhalla and Philippe Naveau.

**********

Manuel Hentschel (University of Geneva)

Hüsler-Reiss Graphical Models for Multivariate Extremes

Slides Manuel Hentschel

Graphical models in extremes have emerged as a diverse and quickly expanding research area in extremal dependence modeling. They allow for parsimonious statistical methodology and are particularly suited for enforcing sparsity in high-dimensional problems. We discuss the parametric class of Hüsler-Reiss graphical models on undirected graphs, which shares many useful properties with Gaussian graphical models. We describe model properties, methods for statistical inference on known graph structures, and structure learning algorithms when the graph is unknown. We illustrate different methods in an application to flight delay data at US airports.

**********

Giulia Marchello (Inria)

Unsupervised statistical learning with latent block models on dynamic discrete data

Slides Giulia Marchello

In our interconnected world, we constantly generate vast and diverse data, necessitating automated methods for detection, synthesis, and understanding. Over the years, statistical learning has evolved to address these data challenges, with unsupervised learning being pivotal in uncovering patterns without predefined labels. In particular, clustering is a valuable technique for summarizing high-dimensional data by grouping similar observations based on shared characteristics. Extending this approach to simultaneously group observations and features, known as co-clustering, reveals intricate relationships among data.
Within this framework, we focus on the exploration of two dynamic co-clustering models specifically designed for count data. These models integrate systems of ordinary differential equations, enabling them to adeptly capture sudden shifts in group membership and data sparsity as time progresses, with the second model boasting the capability to function in real-time on data streams. These innovations hold practical implications in the field of pharmacovigilance, a critical domain whose main activity is the continuous monitoring and assessment of the safety of medical products.

**********

Anas Mourahib (UCLouvain)

Multivariate generalized Pareto distributions along extreme directions

Slides Anas Mourahib

When modeling a vector of risk variables, extreme scenarios are often of special interest. The peaks-over-thresholds method hinges on the notion that, asymptotically, the excesses over a vector of high thresholds follow a multivariate generalized Pareto distribution. However, existing literature has primarily concentrated on the setting when all risk variables are always large simultaneously. In reality, this assumption is often not met, especially in high dimensions.

In response to this limitation, we study scenarios where distinct groups of risk variables may exhibit joint extremes while others do not. These discernible groups are derived from the angular measure inherent in the corresponding max-stable distribution, whence the term extreme direction. We explore such extreme directions within the framework of multivariate generalized Pareto distributions, with a focus on their probability density functions in relation to an appropriate dominating measure.

Furthermore, we provide a stochastic construction that allows any prespecified set of risk groups to constitute the distribution’s extreme directions. This construction takes the form of a smoothed max-linear model and accommodates the full spectrum of conceivable max-stable dependence structures. Additionally, we introduce a generic simulation algorithm tailored for multivariate generalized Pareto distributions, offering specific implementations for extensions of the logistic and Hu ̈sler–Reiss families capable of carrying arbitrary extreme directions.

**********

Alexander Reisach (Université Paris Cité)

Sortability in Structural Causal Models

Slides Alexander Reisach

Causal graphical models encode causal relationships between variables, typically based on a directed acyclic graph (DAG). Structural causal models expand upon this by expressing the causal relationships as explicit functions, which allows for computing the effect of interventions and much more. We show that, in many common parametrizations using linear functions and additive noise, effects accumulate along the causal order. This property enables inferring the causal order from data simply by sorting by a suitable criterion. We introduce the concept of sortability to capture the magnitude of the phenomenon, and show that the accumulating effects can lead to unrealistic values and deterministic relationships. We discuss implications for existing results and future research based on the same model class.

**********

Pierre Ribereau (Université Claude Bernard Lyon 1)

Regionalization the extremal dependence structure using spectral clustering

The influence of an extreme event depends on the geographical features of the region where the event occurs. To understand the behavior of an extreme event, we require a statistical model capable of capturing the extremes and their spatial dependence. Max-stable processes are widely used in studying extreme events. However, assuming a fixed extremal dependence for a max-stable process may not be reasonable, depending on the topology of the region under study. In environmental extreme events, different types of extremal dependencies can appear across the spatial domain. In this study, we present an adapted spectral clustering algorithm for max-stable processes. This algorithm combines spectral clustering with extremal concurrence probability to cluster locations into k regions, each with a similar extremal dependence. Additionally, we propose an approach to model the entire region based on the clustered regions. To validate the proposed methodology, we tested it in two simulation cases using a non-stationary max-stable mixture model. The accuracy of the results encouraged us to apply it to two datasets: rainfall data on the east coast of Australia and rainfall over France.

Privacy | Accessibility