Selected Presentations

M. Croci, G. N. Wells, Reduced- and mixed-precision finite element kernels, FEniCS 2024. Simula Research Laboratory, Oslo, Norway, (2024). Download.

In this talk we investigate the effect of rounding errors in the computation of finite element (FE) kernels and we develop strategies for reduced- and mixed-precision acceleration using half-precision floating point formats. Possible applications of this work include: reduced- and mixed-precision preconditioning, linear and nonlinear solvers, and timestepping methods.

The popularity of artificial intelligence has pushed for the design of computer architectures that are increasingly faster at performing computations using fewer bits. For instance, 16-bit half-precision computations are over 32 times more efficient than double precision computations on NVIDIA H100 GPU tensor cores. However, reducing the precision leads to a significant increase in rounding errors and extreme care must be taken during algorithmic design.

With the objective of exploiting these new chip capabilities in FE methods, we focus on FE cell kernels for matrix assembly. By analysing the algorithm used by FFCx, we show that the total rounding error is comprised of two terms:

  1. A geometry term depending on the condition number of the Jacobian of the map to the reference cell.
  2. A quadrature term growing with the number of quadrature nodes. Based on this analysis, we propose a strategy for obtaining machine-accurate results at negligible efficiency loss via a mixed-precision treatment.

While this is currently work in progress, early-stage numerical experiments on Intel Sapphire Rapids CPUs show appreciable speedups compared to their double precision equivalents and support our theoretical findings. Additionally, we show how the latest Sapphire Rapids AMX-BF16 registers and instructions offer great promise for further incresing accuracy and speed (although compiler and/or register improvements would be needed).

M. Croci, Multilevel Monte Carlo methods for the approximation of failure probability regions, Computational Mathematics and Applications Seminar, University of Oxford, UK (2024). Download.

In this talk, we consider the problem of approximating failure regions. More specifically, given a costly computational model with random parameters and a failure condition, our objective is to determine the parameter region in which the failure condition is likely to not be satisfied. In mathematical terms, this problem can be cast as approximating the level set of a probability density function. We solve this problem by dividing it into two: 1) The design of an efficient Monte Carlo strategy for probability estimation. 2) The construction of an efficient algorithm for level-set approximation. Following this structure, this talk is comprised of two parts:

In the first part, we present a new multi-output multilevel best linear unbiased estimator (MLBLUE) for approximating expectations. The advantage of this estimator is in its convenience and optimality: Given any set of computational models with known covariance structure, MLBLUE automatically constructs a provenly optimal estimator for any (finite) number of quantities of interest. Nevertheless, the optimality of MLBLUE is tied to its optimal set-up, which requires the solution of a nonlinear optimization problem. We show how the latter can be reformulated as a semi-definite program and thus be solved reliably and efficiently.

In the second part, we construct an adaptive level-set approximation algorithm for smooth functions corrupted by noise in $\mathbb{R}^d$. This algorithm only requires point value data and is thus compatible with Monte Carlo estimators. The algorithm is comprised of a criterion for level-set adaptivity combined with an a posteriori error estimator. Under suitable assumptions, we can prove that our algorithm will correctly capture the target level set at the same cost complexity of uniformly approximating a $(d-1)$-dimensional function.

M. Croci, Mixed-precision explicit Runge-Kutta methods, IMA Leslie Fox Prize Event, Glasgow (2023). Download.

Motivated by the advent of machine learning, the last few years saw the return of hardware-supported reduced-precision computing. Computations with fewer digits are faster and more memory and energy efficient, but a careful implementation and rounding error analysis are required to ensure that sensible results can still be obtained. In this talk we focus on mixed-precision explicit Runge-Kutta (RK) methods. Mixed-precision algorithms combine low- and high-precision computations in order to benefit from the performance gains of reduced-precision while retaining good accuracy. We show that a naive mixed-precision implementation of RK schemes harms convergence and leads to error stagnation, and we present a more accurate alternative. We introduce new RK-Chebyshev schemes that only use $q\in{1,2}$ high-precision function evaluations to achieve a limiting convergence order of $O(\Delta t^{q})$, leaving the remaining evaluations in low precision. These methods are essentially as cheap as their fully low-precision equivalent and they are as accurate and (almost) as stable as their high-precision counterpart.

M. Croci, Solving differential equations in reduced- and mixed-precision, Oden Institute, Yale University and Ecole Polytecnique Federale de Lausanne (2022). Download.

Motivated by the advent of machine learning, the last few years saw the return of hardware-supported reduced-precision computing. Computations with fewer digits are faster and more memory and energy efficient, but a careful implementation and rounding error analysis are required to ensure that sensible results can still be obtained.

This talk is divided in two parts in which we focus on reduced- and mixed-precision algorithms respectively. Reduced-precision algorithms obtain an as accurate solution as possible given the precision while avoiding catastrophic rounding error accumulation. Mixed-precision algorithms, on the other hand, combine low- and high-precision computations in order to benefit from the performance gains of reduced-precision while retaining good accuracy.

In the first part of the talk we study the accumulation of rounding errors in the solution of the heat equation, a proxy for parabolic PDEs, in reduced precision using round-to-nearest (RtN) and stochastic rounding (SR). We demonstrate how to implement the numerical scheme to reduce rounding errors and we present \emph{a priori} estimates for local and global rounding errors. While the RtN solution leads to rapid rounding error accumulation and stagnation, SR leads to much more robust implementations for which the error remains at roughly the same level of the working precision.

In the second part of the talk we focus on mixed-precision explicit stabilised Runge-Kutta methods. We show that a naive mixed-precision implementation harms convergence and leads to error stagnation, and we present a more accurate alternative. We introduce new Runge-Kutta-Chebyshev schemes that only use $q\in{1,2}$ high-precision function evaluations to achieve a limiting convergence order of $O(\Delta t^{q})$, leaving the remaining evaluations in low precision. These methods are essentially as cheap as their fully low-precision equivalent and they are as accurate and (almost) as stable as their high-precision counterpart.

M. Croci and M. B. Giles, Solving parabolic PDEs in half precision, SIAM CSE 2021 and NLA Group meeting, University of Manchester (2020-2021). Download.

Motivated by the advent of machine learning, the last few years saw the return of hardware-supported low-precision computing. Computations with fewer digits are faster and more memory and energy efficient, but can be extremely susceptible to rounding errors. An application that can largely benefit from the advantages of low-precision computing is the numerical solution of partial differential equations (PDEs), but a careful implementation and rounding error analysis are required to ensure that sensible results can still be obtained.

In this talk we study the accumulation of rounding errors in the solution of the heat equation, a proxy for parabolic PDEs, via Runge-Kutta finite difference methods using round-to-nearest (RtN) and stochastic rounding (SR). We demonstrate how to implement the numerical scheme to reduce rounding errors and we present \emph{a priori} estimates for local and global rounding errors. Let $u$ be the roundoff unit. While the worst-case local errors are $O(u)$ with respect to the discretization parameters, the RtN and SR error behaviour is substantially different. We show that the RtN solution is discretization, initial condition and precision dependent, and always stagnates for small enough $\Delta t$. Until stagnation, the global error grows like $O(u\Delta t^{-1})$. In contrast, the leading order errors introduced by SR are zero-mean, independent in space and mean-independent in time, making SR resilient to stagnation and rounding error accumulation. In fact, we prove that for SR the global rounding errors are only $O(u\Delta t^{-1/4})$ in 1D and are essentially bounded (up to logarithmic factors) in higher dimensions.

M. Croci, M. B. Giles, P. E. Farrell, M. E. Rognes, V. Vinje, Non-nested multilevel Monte Carlo methods with applications to brain simulation, Oden Institute (2021). Download.

This talk consists of two parts. In the first, we develop two new strategies for spatial white noise and Gaussian-Matérn field sampling that work within a non-nested multilevel (quasi) Monte Carlo (ML(Q)MC) hierarchy. In the second, we apply the techniques developed to quantify the level of uncertainty in a new stochastic model for tracer transport in the brain.

The new sampling techniques are based on the stochastic partial differential equation (SPDE) approach, which recasts the sampling problem as the solution of an elliptic equation driven by spatial white noise. The efficient sampling of white noise realisations can be computationally expensive. In this talk, we present two new sampling techniques that can be used to efficiently compute white noise samples in a FEM-MLMC and FEM-MLQMC setting. The key idea is to exploit the finite element matrix assembly procedure and factorise each local mass matrix independently, hence avoiding the factorisation of a large matrix. In a multilevel framework, the white noise samples must be coupled between subsequent levels. We show how our technique can be used to enforce this coupling even in the case of non-nested mesh hierarchies.

In the MLQMC case, the QMC integrand variables must also be ordered in order of decaying importance to achieve fast convergence with respect to the number of samples. We express white noise as a Haar wavelet series whose hierarchical structure naturally exposes the leading order dimensions. We split this series in two terms which we sample via a hybrid standard Monte Carlo/QMC approach.

In the final part of the talk, we employ a combination of the methods presented to solve a PDE with random coefficients describing tracer transport within the interstitial fluid of the brain.

M. Croci, M. B. Giles, M. E. Rognes, and P. E. Farrell, MLQMC Methods for Elliptic PDEs Driven by White Noise, SIAM CSE 2019, Spokane, USA and ICIAM 2019, Valencia, Spain. Download.

When solving partial differential equations driven by additive spatial white noise, the efficient sampling of white noise realizations can be challenging. In this paper we focus on the efficient sampling of white noise using quasi-random points in a finite element method and multilevel Quasi Monte Carlo (MLQMC) setting. This work is an extension of previous research on white noise sampling for MLMC.

We express white noise as a wavelet series expansion that we divide in two parts. The first part is sampled using quasi-random points and contains a finite number of terms in order of decaying importance to ensure good QMC convergence. The second part is a correction term which is sampled using standard pseudo-random numbers.

We show how the sampling of both terms can be performed in linear time and memory complexity in the number of mesh cells via a supermesh construction. Furthermore, our technique can be used to enforce the MLQMC coupling even in the case of non-nested mesh hierarchies. We demonstrate the efficacy of our method with numerical experiments.

M. Croci, M. B. Giles, M. E. Rognes, and P. E. Farrell, Efficient white noise sampling and coupling for multilevel Monte Carlo, MCQMC 2018, Rennes, France. Download.

When solving stochastic partial differential equations (SPDEs) driven by additive spatial white noise the efficient sampling of white noise realizations can be challenging. In this talk we present a novel sampling technique that can be used to efficiently compute white noise samples in a finite element and multilevel Monte Carlo (MLMC) setting.

After discretization, the action of white noise on a test function yields a Gaussian vector with the FEM mass matrix as covariance. Sampling such a vector requires an expensive Cholesky factorization and for this reason P0 representations, for which the mass matrix is diagonal, are generally preferred in the literature. This however has other disadvantages. In this talk we introduce an alternative factorization that is naturally parallelizable and has linear cost and memory complexity (in the number of mesh elements).

Moreover, in a MLMC framework the white noise samples must be coupled between subsequent levels so as to respect the telescoping sum. We show how our technique can be used to enforce this coupling even in the case in which the hierarchy is non-nested via a supermesh construction. We conclude the talk with numerical experiments that demonstrate the efficacy of our method. We observe optimal convergence rates for the finite element solution of the elliptic SPDEs of interest. In a MLMC setting, a good coupling is enforced and the telescoping sum is respected.