## Approximating high dimensional functions – Abstracts

**Isotonic regression in general dimensions**

Richard Samworth (University of Cambridge, UK)

We study the least squares regression function estimator over the class of real-valued functions on $[0,1]^d$ that are increasing in each coordinate. For uniformly bounded signals and with a fixed, cubic lattice design, we establish that the estimator achieves the minimax rate of order $n^{-\min{2/(d+2),1/d}$ in the empirical $L_2$ loss, up to poly-logarithmic factors. Further, we prove a sharp oracle inequality, which reveals in particular that when the true regression function is piecewise constant on $k$ hyperrectangles, the least squares estimator enjoys a faster, adaptive rate of convergence of $(k/n)^{min(1,2/d)}$, again up to poly-logarithmic factors. Previous results are confined to the case $d \leq 2$. Finally, we establish corresponding bounds (which are new even in the case $d = 2$) in the more challenging random design setting. There are two surprising features of these results: first, they demonstrate that it is possible for a global empirical risk minimisation procedure to be rate optimal up to poly-logarithmic factors even when the corresponding entropy integral for the function class diverges rapidly; second, they indicate that the adaptation rate for shape-constrained estimators can be strictly worse than the parametric rate.

This is joint work with Qiyang Han, Tengyao Wang and Sabyasachi Chatterjee

**Concentration of tempered posteriors and of their variational approximations**

Pierre Alquier (CREST, ENSAE, Université Paris Saclay, France)

While Bayesian methods are extremely popular in statistics and machine learning, their application to massive datasets is often challenging, when possible at all. Indeed, the classical MCMC algorithms are prohibitively slow when both the model dimension and the sample size are large. Variational Bayesian methods aim at approximating the posterior by a distribution in a tractable family. Thus, MCMC are replaced by an optimization algorithm which is orders of magnitude faster. VB methods have been applied in such computationally demanding applications as including collaborative filtering, image and video processing, NLP and text processing… However, despite very nice results in practice, the theoretical properties of these approximations are usually not known. In this talk, we propose a general approach to prove the concentration of variational approximations of fractional posteriors. We apply our theory to two examples: matrix completion, and Gaussian VB.

**Multilevel weighted least squares polynomial approximation**

Sören Wolfers (KAUST, Saudi Arabia)

Weighted least squares polynomial approximation uses random samples to determine projections of functions onto spaces of polynomials. It has been shown that, using an optimal distribution of sample locations, the number of samples required to achieve quasi-optimal approximation in a given polynomial subspace scales, up to a logarithmic factor, linearly in the dimension of this space. However, in many applications, the computation of samples includes a numerical discretization error. Thus, obtaining polynomial approximations with a single level method can become prohibitively expensive, as it requires a sufficiently large number of samples, each computed with a sufficiently small discretization error. As a solution to this problem, we propose a multilevel method that utilizes samples computed with different accuracies and is able to match the accuracy of single-level approximations with reduced computational cost. We derive complexity bounds under certain assumptions about polynomial approximability and sample work. Furthermore, we propose an adaptive algorithm for situations where such assumptions cannot be verified a priori. Finally, we provide an efficient algorithm for the sampling from optimal distributions and an analysis of computationally favorable alternative distributions. Numerical experiments underscore the practical applicability of our method.

**Tensor train algorithms for stochastic PDE problems**

Sergey Dolgov (University of Bath, UK)

Surrogate modelling is becoming a popular technique to reduce the computational burden of forward and inverse uncertainty quantification problems. In this talk we use the Tensor Train (TT) decomposition for approximating the forward solution map of the stochastic diffusion equation, as well as the posterior density function in the Bayesian inverse problem. The TT decomposition is based on the separation of variables, hence the multivariate integration factorises into a set of one-dimensional quadratures. For sufficiently smooth functions, the storage cost of the TT decomposition grows much slower with the accuracy compared to the Monte Carlo rate. The TT decomposition of a multivariate function can be constructed from adaptively chosen fibres of samples along each variable (the so-called TT cross interpolation), with the number of function evaluations proportional to the (small) number of unknowns in the TT representation. In turn, the TT approximation of the probability density function allows an efficient computation of the Rosenblatt transform, and hence a fast method for proposing almost uncorrelated MCMC samples. We show that for smooth PDE coefficients the TT approach can be faster than Quasi Monte Carlo and adaptive Metropolis techniques.

**Recovery of ridge functions in the uniform norm**

Sebastian Mayer (Universität Bonn and Fraunhofer SCAI, Germany)

A ridge function is a multivariate function of the form f(x) = g(a^T x), where a is a d-dimensional vector and g is univariate. Ridge functions have been used since the 1980s under the name single-index model in semi-parametric statistics to face the issues of high-dimensional non-parametric regression. Recently, there has also been interest in the recovery of such ridge functions in the uniform norm from a finite number of function samples. There are results that show that sufficiently smooth ridge function can be efficiently recovered, using only a polynomially large number of samples. However, these results

impose additional assumptions on the ridge functions besides regularity. In this talk, I will show that these additional assumptions are essential to overcome the curse of dimensionality. Surprisingly, the tractability of recovering ridge functions in the uniform norm turns out to be very sensitive to specific setup—ranging from polynomial tractability to the curse of dimensionality.

**Ridge functions, their sums, and sparse additive functions**

Jan Vybiral (Czech Technical University, Prague)

We discuss the curse of dimensionality for approximation of smooth multivariate functions and several structural assumptions recently studied, which allow to avoid it. These include the ridge functions and their sums. Finally, we review some recent results on approximation of the so-called sparse additive model.

**Approximation of Generalized Ridge Functions in High Dimensions**

Sandra Keiper (Technische Universität Berlin, Germany)

In this talk we will study the approximation of generalized ridge functions, namely of functions which are constant along submanifolds of $R^N$. We introduce the notion of linear-sleeve functions, whose function values only depend on the distance to some unknown linear subspace L. We propose two efficient algorithms to approximate linear-sleeve functions f(x) = g(dist(x,L^2), when both the linear subspace $L \subset R^N$ and the function $g \in C^s[0,1]$ are unknown. We will provide error bounds for both algorithms and as well as an extensive numerical comparison of both. We further propose an approach of how to apply these algorithms to capture general sleeve functions, which are constant along lower dimensional submanifolds.

**Score estimation with infinite-dimensional exponential families**

Arthur Gretton / Dougal Sutherland (UCL, UK)

The infinite-dimensional exponential family is a rich generalization of the standard exponential family, going beyond finite sufficient statistic functions to allow for very complex models. In particular, we study the kernel exponential family, where the natural parameter lies in a reproducing kernel Hilbert space. Computing the normalization constant in this class of models is difficult, but efficient estimation is possible via score matching. This approach, however, has cubic computational complexity in both the number of sampled points and their dimension. We thus propose estimation with a low-rank, Nyström-like approximation. The new solution retains essentially the same convergence rate of the full-rank solution, with substantially less computational effort and storage. We demonstrate the applicability of the method both to density estimation and to approximating Hamiltonian Monte Carlo when gradients are available.

**Optimal sampling in weighted least-squares methods: Application to high-dimensional approximation**

Albert Cohen ( Université Pierre et Marie Curie, France)

Least squares methods are of common use when one needs to approximate a function based on its noiseless or noisy observation at n scattered points by a simpler function chosen in an m dimensional space with m less than n. Depending on the context, these points may be randomly drawned according to some distribution, or deterministically selected by the user.

This talk will survey some recent results on the stability and approximation properties of weighted least squares method, in relation with the spatial distribution of the sampling. One of our main finding is that an optimal random sampling strategy can be defined by means of the Christoffel function associated with the space where the approximation is searched. Here, optimal means that near-best approximation properties are achieved at the prize of a number of sample n larger than m only by a logarithmic factor.

One principal motivation is the application to high-dimensional approximation problems, such as coming PDEs with random input parameters. Motivated by this setting, we also discuss how the optimal sampling can be practically generated and incrementally updated when the approximation spaces are adaptively selected.

This is a joint work with Giovanni Migliorati.