# Tests and Benchmarks#

If you ended up here you probably think about fixing some bugs or contributing
some code. **Awesome!** Just open a PR, and we will guide you through the
process. The following section contains some more detailed information of the
continues integration (CI) procedure we follow. In the end, each commit has to
pass them before it can be merged into the main branch on GitHub.

The first step to develop code is to clone the GitHub repo locally:

```
git clone git@github.com:emsig/emg3d.git
```

All requirements for the dev-toolchain are collected in the
`requirements-dev.txt`

file, so you can install them all by running

```
pip install -r requirements_dev.txt
```

With this you have all the basic tools to run the tests, lint your code, build the documentation, and so on.

## Continuous Integration#

The CI elements are:

Linting:

`flake8`

Tests:

`pytest`

Code coverage:

`coveralls`

Link checks:

`sphinx`

Code quality:

`codacy`

Documentation:

`sphinx`

Benchmarks:

`asv`

(1) to (6) are run automatically through GitHub actions when committing changes to GitHub. Any code change should pass these tests. Additionally, it is crucial that new code comes with the appropriate tests and documentation, and if applicable also with the appropriate benchmarks. However, you do not need any of that to start a PR - everything can go step-by-step!

Many of the tests are set up in the Makefile (only tested on Linux):

To install the current branch in editable mode:

`make install`

To check linting:

`make flake8`

To run pytest:

`make pytest`

To build the documentation:

`make html`

Or to list all the possibilities, simply run:

`make`

There is also a benchmark suite using *airspeed velocity*, located in the
emsig/emg3d-asv-repository. The results
of my machine can be found in the emsig/emg3d-bench, its rendered version at
emsig.xyz/emg3d-asv. They ensure that we do
not slow than the computation by introducing regressions, particularly when we
make changes to `emg3d.core`

or `emg3d.solver`

.

## CPU & RAM#

Here some information if someone is interested in tackling the very core of emg3d, trying to make it faster or reduce the memory consumption. The multigrid method is attractive because it shows optimal scaling for both runtime and memory consumption. Below some insights about what has been tried and what still could be tried in order to improve the current code.

### Runtime#

The costliest functions (for big models) are:

<5 % each, in decreasing importance:

Example with 262,144 / 2,097,152 cells (`nu_{i,1,c,2}=0,2,1,2`

;
`sslsolver=False`

; `semicoarsening=True`

; `linerelaxation=True`

):

93.7 / 95.8 %

`smoothing`

3.6 / 2.0 %

`prolongation`

1.9 / 1.9 %

`residual`

0.6 / 0.4 %

`restriction`

The rest can be ignored. For small models, the percentage of `smoothing`

goes
down and of `prolongation`

and `restriction`

go up. But then the modeller
is fast anyway.

`emg3d.core.gauss_seidel`

and `emg3d.core.amat_x`

are written
in `numba`

; jitting `emg3d.solver.RegularGridProlongator`

turned out
to not improve things, and many functions used in the restriction are jitted
too. The costliest functions (RAM- and CPU-wise) are therefore already written
in `numba`

.

**Any serious attempt to improve the speed will have to tackle the smoothing
itself.**

**Things which could be tried**

Not much has been tested with the

`numba`

-options`parallel`

;`prange`

; and`nogil`

.There might be an additional gain by making

`emg3d.meshes.TensorMesh`

,`emg3d.models.Model`

, and`emg3d.fields.Field`

instances jitted classes.

**Things which have been tried**

One important aspect of the smoothing part is the memory layout.

`emg3d.core.gauss_seidel`

and`emg3d.core.gauss_seidel_x`

are ideal for F-arrays (loop z-y-x, hence slowest to fastest axis).`emg3d.core.gauss_seidel_y`

and`emg3d.core.gauss_seidel_z`

, however, would be optimal for C-arrays. But copying the arrays to C-order and afterwards back is costlier in most cases for both CPU and RAM. The one possible and therefore implemented solution was to swap the loop-order in`emg3d.core.gauss_seidel_y`

.Restriction and prolongation information could be saved in a dictionary instead of recomputing it every time. Turns out to be not worth the trouble.

Rewrite

`emg3d.solver.RegularGridProlongator`

as jitted function, but the iterator approach seems to be better for large grids.

### Memory#

Most of the memory requirement comes from storing the data itself, mainly the
fields (source field, electric field, and residual field) and the model
parameters (resistivity, eta, mu). For a big model, they some up; e.g., almost
3 GB for an isotropic model with 256 x 256 x 256 cells. Anyhow, memory
consumption is pretty low already, and there is probably not much to gain, at
least in the solver part (`emg3d.core`

and `emg3d.solver`

). That
looks different for some of the interpolation and plotting routines, which
could be improved .

### Benchmark scripts for status quo#

To test CPU and RAM on your machine, you can use and adjust the following script. The old notebooks which were used to generate the above figures in the manual can be found at

CPU: 4b_Runtime.ipynb.

```
In [1]: import emg3d
...: import numpy as np
...: import matplotlib.pyplot as plt
...: from memory_profiler import memory_usage
...:
In [2]: def compute(nx):
...: """Simple computation routine.
...:
...: This is the actual model it runs. Adjust this to your needs.
...:
...: - Model size is nx * nx * nx, centered around the origin.
...: - Source is at the origin, x-directed.
...: - Frequency is 1 Hz.
...: - Homogenous space of 1 Ohm.m.
...:
...: """
...:
...: # Grid
...: hx = np.ones(nx)*50
...: x0 = -nx//2*50
...: grid = emg3d.TensorMesh([hx, hx, hx], x0=(x0, x0, x0))
...:
...: # Model and source field
...: model = emg3d.Model(grid, property_x=1.0)
...: sfield = emg3d.get_source_field(
...: grid, source=[0, 0, 0, 0, 0], frequency=1.0)
...:
...: # Compute the field
...: _, inf = emg3d.solve(
...: model, sfield, verb=0, plain=True, return_info=True)
...:
...: return inf['time']
...:
In [3]: # Loop over model sizes (adjust to your needs).
...: nsizes = np.array([32, 48, 64, 96, 128, 192, 256, 384])
...: memory = np.zeros(nsizes.shape)
...: runtime = np.zeros(nsizes.shape)
...:
...: # Loop over nx
...: for i, nx in enumerate(nsizes):
...: print(f" => {nx}^3 = {nx**3:12,d} cells")
...: mem, time = memory_usage((compute, (nx, ), {}), retval=True)
...: memory[i] = max(mem)
...: runtime[i] = time
...:
In [4]: # Plot CPU
...: plt.figure()
...: plt.title('Runtime')
...: plt.loglog(nsizes**3/1e6, runtime, '.-')
...: plt.xlabel('Number of cells (in millions)')
...: plt.ylabel('CPU (s)')
...: plt.axis('equal')
...: plt.show()
...:
In [5]: # Plot RAM
...: plt.figure()
...: plt.title('Memory')
...: plt.loglog(nsizes**3/1e6, memory/1e3, '-', zorder=10)
...: plt.xlabel('Number of cells (in millions)')
...: plt.ylabel('RAM (GB)')
...: plt.axis('equal')
...: plt.show()
...:
```

### Scripts for solver investigations#

The non-standard Cholesky solver, `emg3d.core.solve`

, does almost all the
work, in the end. Improving the speed of that part only slightly would have a
huge effect overall. Here some notes from some dabbling.

**Benchmark Tests for Cholesky Solve**

numba, numpy, scipy, lapack

Benchmarks: - small and big - real and complex valued

“Givens”:

Diagonal != 0

Diagonal values are large (no pivoting)

Only diagonal values would be complex

```
In [6]: import numba as nb
...: import numpy as np
...: import scipy as sp
...: from numpy.testing import assert_allclose
...: from scipy.linalg.lapack import get_lapack_funcs
...:
...: _numba_setting = {'nogil': True, 'fastmath': True, 'cache': True}
...:
```

Status quo

```
In [7]: @nb.njit(**_numba_setting)
...: def _emg3d_solve(amat, bvec):
...: n = len(bvec)
...: h = np.zeros(1, dtype=amat.dtype)[0] # Pre-allocate
...: d = 1./amat[0]
...:
...: for i in range(1, min(n, 6)):
...: amat[i] *= d
...:
...: for j in range(1, n):
...: h *= 0. # Reset h
...: for k in range(max(0, j-5), j):
...: h += amat[j+5*k]*amat[j+5*k]*amat[6*k]
...: amat[6*j] -= h
...: d = 1./amat[6*j]
...: for i in range(j+1, min(n, j+6)):
...: h *= 0. # Reset h
...: for k in range(max(0, i-5), j):
...: h += amat[i+5*k]*amat[j+5*k]*amat[6*k]
...: amat[i+5*j] -= h
...: amat[i+5*j] *= d
...:
...: amat[6*(n-1)] = d
...: for j in range(n-2, -1, -1):
...: amat[6*j] = 1./amat[6*j]
...:
...: for j in range(1, n):
...: h *= 0. # Reset h
...: for k in range(max(0, j-5), j):
...: h += amat[j+5*k]*bvec[k]
...: bvec[j] -= h
...:
...: for j in range(n):
...: bvec[j] *= amat[6*j]
...:
...: for j in range(n-2, -1, -1):
...: h *= 0. # Reset h
...: for k in range(j+1, min(n, j+6)):
...: h += amat[k+5*j]*bvec[k]
...: bvec[j] -= h
...:
...: def emg3d_solve(amat, bvec):
...: out = bvec.copy()
...: _emg3d_solve(amat.copy(), out)
...: return out
...:
```

An alternative

```
In [8]: @nb.njit(**_numba_setting)
...: def _emg3d_solve2(amat, bvec):
...: n = len(bvec)
...: h = np.zeros(1, dtype=amat.dtype)[0] # Pre-allocate
...: d = 1./amat[0]
...:
...: for i in range(1, min(n, 6)):
...: amat[i] *= d
...:
...: for j in range(1, n):
...: h *= 0. # Reset h
...: for k in range(max(0, j-5), j):
...: h += amat[j+5*k]*amat[j+5*k]*amat[6*k]
...: amat[6*j] -= h
...: d = 1./amat[6*j]
...: for i in range(j+1, min(n, j+6)):
...: h *= 0. # Reset h
...: for k in range(max(0, i-5), j):
...: h += amat[i+5*k]*amat[j+5*k]*amat[6*k]
...:
...: amat[i+5*j] = d*(amat[i+5*j] - h)
...:
...: amat[6*(n-1)] = d
...: for j in range(n-2, -1, -1):
...: amat[6*j] = 1./amat[6*j]
...:
...: for j in range(1, n):
...: h *= 0. # Reset h
...: for k in range(max(0, j-5), j):
...: h += amat[j+5*k]*bvec[k]
...: bvec[j] -= h
...:
...: for j in range(n):
...: bvec[j] *= amat[6*j]
...:
...: for j in range(n-2, -1, -1):
...: h *= 0. # Reset h
...: for k in range(j+1, min(n, j+6)):
...: h += amat[k+5*j]*bvec[k]
...: bvec[j] -= h
...:
...:
...: def emg3d_solve2(amat, bvec):
...: out = bvec.copy()
...: _emg3d_solve2(amat.copy(), out)
...: return out
...:
```

SciPy and NumPy solvers

```
In [9]: def np_linalg_solve(A, b):
...: return np.linalg.solve(A, b)
...:
...: def sp_linalg_solve(A, b):
...: out = b.copy()
...: sp.linalg.solve(A.copy(), out, overwrite_a=True,
...: overwrite_b=True, check_finite=False)
...: return out
...:
...: def sp_linalg_lu_solve(A, b):
...: out = b.copy()
...: lu_and_piv = sp.linalg.lu_factor(A.copy(), overwrite_a=True,
...: check_finite=False)
...: xlu = sp.linalg.lu_solve(lu_and_piv, out, overwrite_b=True,
...: check_finite=False)
...: return out
...:
...: def sp_linalg_cho_solve(A, b):
...: amat = A.copy()
...: clow = sp.linalg.cho_factor(amat, lower=True, overwrite_a=True,
...: check_finite=False)
...: out = b.copy()
...: sp.linalg.cho_solve(clow, out, overwrite_b=True,
...: check_finite=False)
...: return out
...:
...: def sp_linalg_cho_banded(A, b):
...: amat = A.copy()
...: c = sp.linalg.cholesky_banded(amat, overwrite_ab=True,
...: lower=True, check_finite=False)
...: out = b.copy()
...: sp.linalg.cho_solve_banded((c, True), out, overwrite_b=True,
...: check_finite=False)
...: return out
...:
```

Measuring them. You can get the data at https://github.com/emsig/data/raw/main/emg3d/benchmarks/CholeskySolveBenchmark.npz

```
In [10]: data = np.load('CholeskySolveBenchmark.npz')
....:
....: for cr in ['real', 'cplx']:
....: for bs in ['small', 'big']:
....:
....: print(f"dtype={cr}; size={bs}")
....:
....: # Get test data.
....: amat = data[cr+'_'+bs+'_'+'amat']
....: bvec = data[cr+'_'+bs+'_'+'bvec']
....: out = data[cr+'_'+bs+'_'+'out']
....:
....: # Re-arrange to full (symmetric) or banded matrix
....: # for some solvers.
....: n = bvec.size
....: A = np.zeros((n, n), dtype=amat.dtype)
....: Ab = np.zeros((6, n), dtype=amat.dtype)
....: for i in range(n):
....: A[i, i] = amat[i*6]
....: Ab[0, i] = amat[i*6]
....: for j in range(1, 6):
....: if i+j < n:
....: A[i, i+j] = amat[i*6+j]
....: A[i+j, i] = amat[i*6+j]
....: Ab[j, i] = amat[i*6+j]
....:
....: # Assert result is correct
....: assert_allclose(emg3d_solve(amat, bvec), out, rtol=1e-6)
....: assert_allclose(emg3d_solve2(amat, bvec), out, rtol=1e-6)
....: assert_allclose(np_linalg_solve(A, bvec), out, rtol=1e-6)
....: assert_allclose(sp_linalg_solve(A, bvec), out, rtol=1e-6)
....: assert_allclose(sp_linalg_lu_solve(A, bvec), out, rtol=1e-6)
....: if cr == 'real':
....: assert_allclose(sp_linalg_cho_solve(A, bvec),
....: out, rtol=1e-6)
....: assert_allclose(sp_linalg_cho_banded(Ab, bvec),
....: out, rtol=1e-6)
....:
....: # Test speed
....: print(' np.linalg.solve : ', end='')
....: %timeit np_linalg_solve(A, bvec)
....:
....: print(' sp.linalg.solve : ', end='')
....: %timeit sp_linalg_solve(A, bvec)
....:
....: print(' sp.linalg.lu_solve : ', end='')
....: %timeit sp_linalg_lu_solve(A, bvec)
....:
....: if cr == 'real':
....:
....: print(' sp.linalg.cho_solve : ', end='')
....: %timeit sp_linalg_cho_solve(A, bvec)
....:
....: print(' sp.linalg.cho_banded : ', end='')
....: %timeit sp_linalg_cho_banded(Ab, bvec)
....:
....: print(' emg3d.solve : ', end='')
....: %timeit emg3d_solve(amat, bvec)
....:
....: print(' emg3d.solve2 : ', end='')
....: %timeit emg3d_solve2(amat, bvec)
....:
....: print(80*'-')
....:
```