Géné-Pi

Mathematics of generative models

Preview

Claire Boyer, PR, Université Paris-Saclay

The Géné-Pi project aims to develop a unified theoretical framework to better understand and improve Transformer-type deep learning architectures and diffusion models. The goal is to increase their reliability, efficiency, and applicability in various contexts, particularly in self-supervised learning, data generation, and physics-constrained modeling.

Keywords: Transformer-based models, attention layers, diffusion-based generative models

Missions

Our researches

Understanding the role of optimization and its statistical impacts on deep models

Analyze how optimization trajectories (gradient descent, hyperparameter selection) induce implicit biases
that influence the generalization and robustness of models.
Combine statistical theory and optimization tools to jointly study optimization errors and statistical errors
on simplified but representative models.

Elucidating the internal mechanisms of Transformers

Understanding how Transformers learn to extract and structure information, and identifying situations where
attention mechanisms fail (head entanglement).
Study controlled statistical tasks (multi-location regression, clustering, self-supervision) and analyze
local and dynamic learning minima in order to propose algorithmic and architectural corrections.

Linking Transformers and classical statistical dimension reduction methods

Show how certain Transformer architectures learn representations similar to classical methods (PCA,
PLS), while offering greater flexibility.
Adopt a continuous view of attention layers as operators acting on distributions, and analyze their
learning by gradient descent in Gaussian and semi-Gaussian frameworks.

Deconstructing the “black box” of score matching in distribution models

Explain why and how diffusion models effectively learn complex laws without excessively memorizing data.
Study the role of implicit regularization induced by optimization, analyze scaling laws and their link to the stability and generalization of the learned score.

Improving sampling and integrating physical constraints into generative models

Make diffusion models more efficient, interpretable, and suitable for non-Euclidean, discrete, or
physically governed data.
Explore alternatives to isotropic Gaussian noise, develop diffusions compatible with discrete structures,
and integrate constraints from PDEs via kernel-based theoretical frameworks.

Consortium

Université Paris-Saclay, Inria, Sorbonne Université

Scientific attempts

Societal impacts

Skills development

Publication

Autres projets

MacLeOD

Machine learning on geometries and distributions

MadLearning

Deep Learning Mathematics: From Theory to Applications

MAGICALL

Mathematics of generative models: an interdisciplinary analysis of loss function landscapes

PERSNET

PERsistent Structures in Neural NETworks

PRODIGE-AI

PRObability, ranDom matrIx theory, Geometry and gEneralization for generative-AI

TENSOR4ML

TENSOR methods FOR mastering modern Machine Learning

THEOREM

Theory for more efficient generative models

Call for chairs Attractivités

The PEPR IA Research Program is opening its Call for Chairs Attractivité, aimed at junior and senior researchers, with the main criterion being an excellent track record in research in the PEPR IA themes.

NNawaQ

NNawaQ, Neural Network Adequate Hardware Architecture for Quantization (HOLIGRAIL project)

Package Python Keops

Package Python Keops for (very) high-dimensional tensor calculations (PDE-AI project)

MPTorch

MPTorch, a PyTorch-based framework for simulating and emulating custom precision DNN training (HOLIGRAIL project)

CaBRNeT

CaBRNeT, a library for developing and evaluating Case-Based Reasoning Models (SAIF project)

FloPoCo

FloPoCo (Floating-Point Cores), a generator of arithmetic cores and its applications to IA accelerators (HOLIGRAIL project)

SNN Software

SNN Software, Open Source Tools for SNN Design (EMERGENCES project)

SDOT

SDOT, A C++ and Python library for Semi-Discrete Optimal Transport (PDE-AI project)

Lazylinop

Lazylinop (Lazy Linear Operator), a high-level linear operator based on an arbitrary underlying implementation, (SHARP project)

CAISAR

CAISAR, a platform for characterizing artificial intelligence safety and robustness

P16

P16 or to develop, distribute and maintain a set of sovereign libraries for AI

AIDGE

AIDGE, the DEEPGREEN project's open embedded development platform

Jean-Zay

Jean Zay or the national infrastructure for the AI research community

ADAPTING

An approach that goes further than current hardware architectures, with the aim of reaching the next generation of AI applications.

Call of chairs Choose France – CNRS AI Rising Talents (closed call)

Call of chairs Choose France - CNRS AI Rising Talents (closed call)

CEA AI Rising Talents Grant

The CEA AI Rising Talents program offers you a tremendous opportunity to bring your ideas to life and lead your own research project for the benefit of industry and society.