Verification and Benchmarks

Documentation

Test case

To verify and benchmark the field solver, we consider a circular geometry with a hole in the middle where the radial coordinate is defined as $\rho = \sqrt{x^2+y^2}$ . The geometry is confined between the radial values $\rho_{min}=0.2$ and $\rho_{max}=0.4$ . To define the test functions and coefficients, we introduce the normalized flux label and the geometric poloidal angle , defined as follows: $\begin{align} \rho_N = &\frac{\rho-\rho_{min}}{\rho_{max} - \rho_{min}}, \\ \tan\theta = &\frac{y}{x}. \end{align}$ The test functions are defined as: $\begin{align} \phi = & \cos\left( \frac{3}{2}\pi\rho_N\right) \sin(4\theta) + 1.3 \\ c = & 1.1 + \cos\left(\frac{\pi}{2}\rho_N\right) \cos(3\theta) \\ \lambda = & \rho\sin(\theta) \\ \xi = & \sqrt{\rho} \left(2+\cos(\theta)\right). \end{align}$ With these functions, we can compute the right-hand side $b$ at the inner mesh points analytically.

At the outer Boundary $\rho_{max}=0.4$ , we apply Dirichlet boundary conditions, where the values at the boundary and ghost points are prescribed according to the solution. On the inner boundary $\rho_{max}=0.2$ we can either set Dirichlet boundary conditions in the same manner or prescribe Neumann boundary conditions, which are according to the prescribed solution zero normal gradient.

We note that the test case is representative of more complex geometry (divertors etc.) due to the usage of a locally Cartesian mesh. At the same time it allows to scale the problem size quickly.

Verification

Given the right-hand side, boundary conditions, and coefficients, we numerically solve the field equation and compare the obtained solution with the prescribed analytical solution. To assess the accuracy of the numerical solution, we measure the numerical error in both the L2-norm and the supremum norm, as a function of the Cartesian grid resolution $h:=\Delta x= \Delta y$ . We subsequently double the grid resolution from $h=4\cdot10^{-3}$ resulting in 25,726 grid points up to $h=1.25\cdot10^{-4}$ resulting in 24,196,346 grid points. When doubling the resolution, we also increase the number of multigrid levels by one (starting with 4), such that the coarsest mesh always remains the same.

The result of this analysis is presented in the figures below. For the case with purely Dirichlet boundary conditions, we observe second-order convergence. However, for the case with Neumann boundary conditions, we only first-order convergence is obtained due to the less accurate numerical discretization of the Neumann boundary conditions.

Numerical error in the L2-norm (blue) and supremum norm (orange) of the field solver for Dirichlet (left) and Neumann (right) boundary conditions at the core, as a function of mesh size. The numerical errors align very well to the theoretical scalings (gray lines), which is second-order for the Dirichlet and first-order for the Neuman case.

Performance benchmarks

We run performance benchmarks on the Raven supercomputer of the Max Planck Computing and Data Facility. Raven has Intel Xeon IceLake architecture with 72 cores per nodes and 4 Nvidia A100 GPUs per node. First we check the OpenMP scaling of the field solver for a fixed problem size of $h = 5\cdot10^{-4}$ resulting in 1,525,174 grid points, a bit larger than current typicall production runs. For the given problem size the performance satrts to deviate strongly frm the ideal scaling already around 12 OMP threads.

OpenMP scaling of field solver for fixed problem size of 1,525,174 grid points. The update routine is shown in blue and the solve routine in orange with the optimum scaling in gray.

Next we explore the scaling of the solver for varying problem size, ranging from $h = 4.00E-03$ with 25,726 grid points to $h=1.25E-04$ with 24,196,346 grid points. With each doubling of the resolution, we also increased the number of multigrid levels by one, starting with 4 levels. To ensure a fair comparison between the Fortran CPU implementation and the GPU implementation from the PAccX library, we fixed the number of OpenMP threads at 18. The CPU implementation exhibits near-linear scaling with problem size, as theoretically expected. The GPU implementation, while significantly faster overall, approaches near-linear scaling only at very high resolutions.

Scaling of solver performance in dependence of problem size carried out on a quarter Raven node. Left: For OpenMP parallelised Fortran solver (MGMRES), . The update phase of the solver (blue) follows a linear scaling $O(N)$ and the solve phase (orange) a $O(N\logN)$ scaling. Right: For GPU accelarated solver (MGMRES_GPU) the alignment to the theoretical scalings is only approached for large system sizes.