Tests and Benchmarks#
If you ended up here you probably think about fixing some bugs or contributing some code. Awesome! Just open a PR, and we will guide you through the process. The following section contains some more detailed information of the continues integration (CI) procedure we follow. In the end, each commit has to pass them before it can be merged into the main branch on GitHub.
The first step to develop code is to clone the GitHub repo locally:
git clone git@github.com:emsig/emg3d.git
All requirements for the dev-toolchain are collected in the
requirements-dev.txt
file, so you can install them all by running
pip install -r requirements_dev.txt
With this you have all the basic tools to run the tests, lint your code, build the documentation, and so on.
Continuous Integration#
The CI elements are:
Linting:
flake8
Tests:
pytest
Code coverage:
coveralls
Link checks:
sphinx
Code quality:
codacy
Documentation:
sphinx
Benchmarks:
asv
(1) to (6) are run automatically through GitHub actions when committing changes to GitHub. Any code change should pass these tests. Additionally, it is crucial that new code comes with the appropriate tests and documentation, and if applicable also with the appropriate benchmarks. However, you do not need any of that to start a PR - everything can go step-by-step!
Many of the tests are set up in the Makefile (only tested on Linux):
To install the current branch in editable mode:
make install
To check linting:
make flake8
To run pytest:
make pytest
To build the documentation:
make html
Or to list all the possibilities, simply run:
make
There is also a benchmark suite using airspeed velocity, located in the
emsig/emg3d-asv-repository. The results
of my machine can be found in the emsig/emg3d-bench, its rendered version at
emsig.xyz/emg3d-asv. They ensure that we do
not slow than the computation by introducing regressions, particularly when we
make changes to emg3d.core
or emg3d.solver
.
CPU & RAM#
Here some information if someone is interested in tackling the very core of emg3d, trying to make it faster or reduce the memory consumption. The multigrid method is attractive because it shows optimal scaling for both runtime and memory consumption. Below some insights about what has been tried and what still could be tried in order to improve the current code.
Runtime#
The costliest functions (for big models) are:
<5 % each, in decreasing importance:
Example with 262,144 / 2,097,152 cells (nu_{i,1,c,2}=0,2,1,2
;
sslsolver=False
; semicoarsening=True
; linerelaxation=True
):
93.7 / 95.8 %
smoothing
3.6 / 2.0 %
prolongation
1.9 / 1.9 %
residual
0.6 / 0.4 %
restriction
The rest can be ignored. For small models, the percentage of smoothing
goes
down and of prolongation
and restriction
go up. But then the modeller
is fast anyway.
emg3d.core.gauss_seidel
and emg3d.core.amat_x
are written
in numba
; jitting emg3d.solver.RegularGridProlongator
turned out
to not improve things, and many functions used in the restriction are jitted
too. The costliest functions (RAM- and CPU-wise) are therefore already written
in numba
.
Any serious attempt to improve the speed will have to tackle the smoothing itself.
Things which could be tried
Not much has been tested with the
numba
-optionsparallel
;prange
; andnogil
.There might be an additional gain by making
emg3d.meshes.TensorMesh
,emg3d.models.Model
, andemg3d.fields.Field
instances jitted classes.
Things which have been tried
One important aspect of the smoothing part is the memory layout.
emg3d.core.gauss_seidel
andemg3d.core.gauss_seidel_x
are ideal for F-arrays (loop z-y-x, hence slowest to fastest axis).emg3d.core.gauss_seidel_y
andemg3d.core.gauss_seidel_z
, however, would be optimal for C-arrays. But copying the arrays to C-order and afterwards back is costlier in most cases for both CPU and RAM. The one possible and therefore implemented solution was to swap the loop-order inemg3d.core.gauss_seidel_y
.Restriction and prolongation information could be saved in a dictionary instead of recomputing it every time. Turns out to be not worth the trouble.
Rewrite
emg3d.solver.RegularGridProlongator
as jitted function, but the iterator approach seems to be better for large grids.
Memory#
Most of the memory requirement comes from storing the data itself, mainly the
fields (source field, electric field, and residual field) and the model
parameters (resistivity, eta, mu). For a big model, they some up; e.g., almost
3 GB for an isotropic model with 256 x 256 x 256 cells. Anyhow, memory
consumption is pretty low already, and there is probably not much to gain, at
least in the solver part (emg3d.core
and emg3d.solver
). That
looks different for some of the interpolation and plotting routines, which
could be improved .
Benchmark scripts for status quo#
To test CPU and RAM on your machine, you can use and adjust the following script. The old notebooks which were used to generate the above figures in the manual can be found at
CPU: 4b_Runtime.ipynb.
In [1]: import emg3d
...: import numpy as np
...: import matplotlib.pyplot as plt
...: from memory_profiler import memory_usage
...:
In [2]: def compute(nx):
...: """Simple computation routine.
...:
...: This is the actual model it runs. Adjust this to your needs.
...:
...: - Model size is nx * nx * nx, centered around the origin.
...: - Source is at the origin, x-directed.
...: - Frequency is 1 Hz.
...: - Homogenous space of 1 Ohm.m.
...:
...: """
...:
...: # Grid
...: hx = np.ones(nx)*50
...: x0 = -nx//2*50
...: grid = emg3d.TensorMesh([hx, hx, hx], x0=(x0, x0, x0))
...:
...: # Model and source field
...: model = emg3d.Model(grid, property_x=1.0)
...: sfield = emg3d.get_source_field(
...: grid, source=[0, 0, 0, 0, 0], frequency=1.0)
...:
...: # Compute the field
...: _, inf = emg3d.solve(
...: model, sfield, verb=0, plain=True, return_info=True)
...:
...: return inf['time']
...:
In [3]: # Loop over model sizes (adjust to your needs).
...: nsizes = np.array([32, 48, 64, 96, 128, 192, 256, 384])
...: memory = np.zeros(nsizes.shape)
...: runtime = np.zeros(nsizes.shape)
...:
...: # Loop over nx
...: for i, nx in enumerate(nsizes):
...: print(f" => {nx}^3 = {nx**3:12,d} cells")
...: mem, time = memory_usage((compute, (nx, ), {}), retval=True)
...: memory[i] = max(mem)
...: runtime[i] = time
...:
In [4]: # Plot CPU
...: plt.figure()
...: plt.title('Runtime')
...: plt.loglog(nsizes**3/1e6, runtime, '.-')
...: plt.xlabel('Number of cells (in millions)')
...: plt.ylabel('CPU (s)')
...: plt.axis('equal')
...: plt.show()
...:
In [5]: # Plot RAM
...: plt.figure()
...: plt.title('Memory')
...: plt.loglog(nsizes**3/1e6, memory/1e3, '-', zorder=10)
...: plt.xlabel('Number of cells (in millions)')
...: plt.ylabel('RAM (GB)')
...: plt.axis('equal')
...: plt.show()
...:
Scripts for solver investigations#
The non-standard Cholesky solver, emg3d.core.solve
, does almost all the
work, in the end. Improving the speed of that part only slightly would have a
huge effect overall. Here some notes from some dabbling.
Benchmark Tests for Cholesky Solve
numba, numpy, scipy, lapack
Benchmarks: - small and big - real and complex valued
“Givens”:
Diagonal != 0
Diagonal values are large (no pivoting)
Only diagonal values would be complex
In [6]: import numba as nb
...: import numpy as np
...: import scipy as sp
...: from numpy.testing import assert_allclose
...: from scipy.linalg.lapack import get_lapack_funcs
...:
...: _numba_setting = {'nogil': True, 'fastmath': True, 'cache': True}
...:
Status quo
In [7]: @nb.njit(**_numba_setting)
...: def _emg3d_solve(amat, bvec):
...: n = len(bvec)
...: h = np.zeros(1, dtype=amat.dtype)[0] # Pre-allocate
...: d = 1./amat[0]
...:
...: for i in range(1, min(n, 6)):
...: amat[i] *= d
...:
...: for j in range(1, n):
...: h *= 0. # Reset h
...: for k in range(max(0, j-5), j):
...: h += amat[j+5*k]*amat[j+5*k]*amat[6*k]
...: amat[6*j] -= h
...: d = 1./amat[6*j]
...: for i in range(j+1, min(n, j+6)):
...: h *= 0. # Reset h
...: for k in range(max(0, i-5), j):
...: h += amat[i+5*k]*amat[j+5*k]*amat[6*k]
...: amat[i+5*j] -= h
...: amat[i+5*j] *= d
...:
...: amat[6*(n-1)] = d
...: for j in range(n-2, -1, -1):
...: amat[6*j] = 1./amat[6*j]
...:
...: for j in range(1, n):
...: h *= 0. # Reset h
...: for k in range(max(0, j-5), j):
...: h += amat[j+5*k]*bvec[k]
...: bvec[j] -= h
...:
...: for j in range(n):
...: bvec[j] *= amat[6*j]
...:
...: for j in range(n-2, -1, -1):
...: h *= 0. # Reset h
...: for k in range(j+1, min(n, j+6)):
...: h += amat[k+5*j]*bvec[k]
...: bvec[j] -= h
...:
...: def emg3d_solve(amat, bvec):
...: out = bvec.copy()
...: _emg3d_solve(amat.copy(), out)
...: return out
...:
An alternative
In [8]: @nb.njit(**_numba_setting)
...: def _emg3d_solve2(amat, bvec):
...: n = len(bvec)
...: h = np.zeros(1, dtype=amat.dtype)[0] # Pre-allocate
...: d = 1./amat[0]
...:
...: for i in range(1, min(n, 6)):
...: amat[i] *= d
...:
...: for j in range(1, n):
...: h *= 0. # Reset h
...: for k in range(max(0, j-5), j):
...: h += amat[j+5*k]*amat[j+5*k]*amat[6*k]
...: amat[6*j] -= h
...: d = 1./amat[6*j]
...: for i in range(j+1, min(n, j+6)):
...: h *= 0. # Reset h
...: for k in range(max(0, i-5), j):
...: h += amat[i+5*k]*amat[j+5*k]*amat[6*k]
...:
...: amat[i+5*j] = d*(amat[i+5*j] - h)
...:
...: amat[6*(n-1)] = d
...: for j in range(n-2, -1, -1):
...: amat[6*j] = 1./amat[6*j]
...:
...: for j in range(1, n):
...: h *= 0. # Reset h
...: for k in range(max(0, j-5), j):
...: h += amat[j+5*k]*bvec[k]
...: bvec[j] -= h
...:
...: for j in range(n):
...: bvec[j] *= amat[6*j]
...:
...: for j in range(n-2, -1, -1):
...: h *= 0. # Reset h
...: for k in range(j+1, min(n, j+6)):
...: h += amat[k+5*j]*bvec[k]
...: bvec[j] -= h
...:
...:
...: def emg3d_solve2(amat, bvec):
...: out = bvec.copy()
...: _emg3d_solve2(amat.copy(), out)
...: return out
...:
SciPy and NumPy solvers
In [9]: def np_linalg_solve(A, b):
...: return np.linalg.solve(A, b)
...:
...: def sp_linalg_solve(A, b):
...: out = b.copy()
...: sp.linalg.solve(A.copy(), out, overwrite_a=True,
...: overwrite_b=True, check_finite=False)
...: return out
...:
...: def sp_linalg_lu_solve(A, b):
...: out = b.copy()
...: lu_and_piv = sp.linalg.lu_factor(A.copy(), overwrite_a=True,
...: check_finite=False)
...: xlu = sp.linalg.lu_solve(lu_and_piv, out, overwrite_b=True,
...: check_finite=False)
...: return out
...:
...: def sp_linalg_cho_solve(A, b):
...: amat = A.copy()
...: clow = sp.linalg.cho_factor(amat, lower=True, overwrite_a=True,
...: check_finite=False)
...: out = b.copy()
...: sp.linalg.cho_solve(clow, out, overwrite_b=True,
...: check_finite=False)
...: return out
...:
...: def sp_linalg_cho_banded(A, b):
...: amat = A.copy()
...: c = sp.linalg.cholesky_banded(amat, overwrite_ab=True,
...: lower=True, check_finite=False)
...: out = b.copy()
...: sp.linalg.cho_solve_banded((c, True), out, overwrite_b=True,
...: check_finite=False)
...: return out
...:
Measuring them. You can get the data at https://github.com/emsig/data/raw/main/emg3d/benchmarks/CholeskySolveBenchmark.npz
In [10]: data = np.load('CholeskySolveBenchmark.npz')
....:
....: for cr in ['real', 'cplx']:
....: for bs in ['small', 'big']:
....:
....: print(f"dtype={cr}; size={bs}")
....:
....: # Get test data.
....: amat = data[cr+'_'+bs+'_'+'amat']
....: bvec = data[cr+'_'+bs+'_'+'bvec']
....: out = data[cr+'_'+bs+'_'+'out']
....:
....: # Re-arrange to full (symmetric) or banded matrix
....: # for some solvers.
....: n = bvec.size
....: A = np.zeros((n, n), dtype=amat.dtype)
....: Ab = np.zeros((6, n), dtype=amat.dtype)
....: for i in range(n):
....: A[i, i] = amat[i*6]
....: Ab[0, i] = amat[i*6]
....: for j in range(1, 6):
....: if i+j < n:
....: A[i, i+j] = amat[i*6+j]
....: A[i+j, i] = amat[i*6+j]
....: Ab[j, i] = amat[i*6+j]
....:
....: # Assert result is correct
....: assert_allclose(emg3d_solve(amat, bvec), out, rtol=1e-6)
....: assert_allclose(emg3d_solve2(amat, bvec), out, rtol=1e-6)
....: assert_allclose(np_linalg_solve(A, bvec), out, rtol=1e-6)
....: assert_allclose(sp_linalg_solve(A, bvec), out, rtol=1e-6)
....: assert_allclose(sp_linalg_lu_solve(A, bvec), out, rtol=1e-6)
....: if cr == 'real':
....: assert_allclose(sp_linalg_cho_solve(A, bvec),
....: out, rtol=1e-6)
....: assert_allclose(sp_linalg_cho_banded(Ab, bvec),
....: out, rtol=1e-6)
....:
....: # Test speed
....: print(' np.linalg.solve : ', end='')
....: %timeit np_linalg_solve(A, bvec)
....:
....: print(' sp.linalg.solve : ', end='')
....: %timeit sp_linalg_solve(A, bvec)
....:
....: print(' sp.linalg.lu_solve : ', end='')
....: %timeit sp_linalg_lu_solve(A, bvec)
....:
....: if cr == 'real':
....:
....: print(' sp.linalg.cho_solve : ', end='')
....: %timeit sp_linalg_cho_solve(A, bvec)
....:
....: print(' sp.linalg.cho_banded : ', end='')
....: %timeit sp_linalg_cho_banded(Ab, bvec)
....:
....: print(' emg3d.solve : ', end='')
....: %timeit emg3d_solve(amat, bvec)
....:
....: print(' emg3d.solve2 : ', end='')
....: %timeit emg3d_solve2(amat, bvec)
....:
....: print(80*'-')
....: