Solver

*Solver, <solver name>, [<extrapolation>, <iterative solver options>]

This command is used to specify the solver to be used to solve the system of equations of the current step. As default, the Pardiso solver is used. The following solvers can be selected:

<solver name>

Use this keyword to specify the strategy for solving the system of equations. numgeo offers direct solvers as well as iterative solvers.
- Direct solvers: PARDISO, MUMPS, Simple solver, Diagonal solver
- Iterative solvers:
  - LIS library: GMRES, FGMRES, BiCGStab, ...
- External solvers: Use this option to interface external solvers to numgeo.
In addition you have the option to deactive the solver.

Direct solvers

PARDISO

PARDISO is a thread-safe and memory efficient software for solving large sparse symmetric and unsymmetric linear systems of equations on shared-memory and distributed-memory multiprocessors ¹. We interface the oneMKL PARDISO implementation. PARDISO is the default solver (no need to enable it using *Solver, Pardiso)

MUMPS

MUMPS - the MUltifrontal Massively Parallel sparse direct Solver (Amestoy et al. 2001² and Amestoy et al. 2006³). numgeo ships with a shared-memory (OpenMP enabled, MPI disabled) version. For an MPI-parallelised version see the External solver feature. To enable MUMPS in a step, use *Solver, Mumps.

Simple solver

A very basic (simple) implementation of a direct solver using serial forward substitution. Should only be used for very small systems, e.g. single-element tests. Compared to MUMPS or PARDISO, the overhead is little but efficiency for medium to large systems is drastically worse. To enable the simple solver in a step use *Solver, simple.

Diagonal solver

A simple implementation of a direct solver for strictly diagonal matrices such as used in explicit dynamics with mass lumping. To enable the diagonal solver in a step, use *Solver, diagnoal.

Iterative solvers:

Most numgeo calculations feature highly-nonlinear materials with only approximated Jacobians, frictional contacts with opening and closing contact surfaces, coupled two-phase or three-phase flow simulations. The arising linearised systems of equations are thus notoriously difficult to solve. So far we have not figured out good combinations of iterative solvers and preconditioners that offer robust and reliable performance. We thus recommend using direct solvers. Any help in providing reliable and efficient iterative solvers is welcome - please get in touch with us!

LIS solver library

LIS (Library of Iterative Solvers for linear systems, Pujii et al. (2005)⁴, Katekemori et al. (2005)⁵, Katekemori et al. (2008)⁶, Nishida (2010)⁷) is a parallel software library for solving discretized linear equations and eigenvalue problems that arise in the numerical solution of partial differential equations using iterative methods.

Preprocessing

Note that before each solve with LIS we preprocess the s.o.e. using MUMPS to permute the matrix to a zero-free diagonal perform matrix scaling.

The keyword to activate the LIS solver reads:

*Solver, LIS = <iterative solver options>

The sub-keyword <iterative solver options> is used to define the iterative solver and the preconditioner. Unlike the usual numgeo convention, this command requires the definition of a string following the LIS-convention. Be careful, blanks do matter! Available solvers and preconditioners as well as their LIS-options are depicted in Figure 5.3 and Figure 5.4. The following example displays how to define a GMRES solver with the standard restart value (40). As a preconditioner, an incomplete LU factorisation with a fill in level of 2:

*Solver, LIS=$-i gmres -p ilu -ilu fill 2$

The LIS library gives access to more than 20 different iterative solvers and 10 preconditioners. A summary is given in the following tables:

Solvers:

Solver	Option	Auxiliary Options
CG	`-i {cg\|1}`
BiCG	`-i {bicg\|2}`
CGS	`-i {cgs\|3}`
BiCGSTAB	`-i {bicgstab\|4}`
BiCGSTAB(l)	`-i {bicgstabl\|5}`	`-ell [2]` — The degree l
GPBiCG	`-i {gpbicg\|6}`
TFQMR	`-i {tfqmr\|7}`
Orthomin(m)	`-i {orthomin\|8}`	`-restart [40]` — The restart value m
GMRES(m)	`-i {gmres\|9}`	`-restart [40]` — The restart value m
Jacobi	`-i {jacobi\|10}`
Gauss-Seidel	`-i {gs\|11}`
SOR	`-i {sor\|12}`	`-omega [1.9]` — The relaxation coefficient ω (0 < ω < 2)
BiCGSafe	`-i {bicgsafe\|13}`
CR	`-i {cr\|14}`
BiCR	`-i {bicr\|15}`
CRS	`-i {crs\|16}`
BiCRSTAB	`-i {bicrstab\|17}`
GPBiCR	`-i {gpbicr\|18}`
BiCRSafe	`-i {bicrsafe\|19}`
FGMRES(m)	`-i {fgmres\|20}`	`-restart [40]` — The restart value m
IDR(s)	`-i {idrs\|21}`	`-irestart [2]` — The restart value s
IDR(1)	`-i {idr1\|22}`
MINRES	`-i {minres\|23}`
COCG	`-i {cocg\|24}`
COCR	`-i {cocr\|25}`

Preconditioners

Preconditioner	Option	Auxiliary Options
None	`-p {none\|0}`
Jacobi	`-p {jacobi\|1}`
ILU(k)	`-p {ilu\|2}`	`-ilu_fill [0]` — The fill level k
SSOR	`-p {ssor\|3}`	`-ssor_omega [1.0]` — The relaxation coefficient ω (0 < ω < 2)
Hybrid	`-p {hybrid\|4}`	`-hybrid_i [sor]` — The linear solver `-hybrid_maxiter [25]` — The maximum number of iterations `-hybrid_tol [1.0e-3]` — The convergence tolerance `-hybrid_omega [1.5]` — The relaxation coefficient ω of the SOR (0 < ω < 2) `-hybrid_ell [2]` — The degree l of the BiCGSTAB(l) `-hybrid_restart [40]` — The restart values of the GMRES and Orthomin
I+S	`-p {is\|5}`	`-is_alpha [1.0]` — The parameter α of I + αS(m) `-is_m [3]` — The parameter m of I + αS(m)
SAINV	`-p {sainv\|6}`	`-sainv_drop [0.05]` — The drop criterion
SA-AMG	`-p {saamg\|7}`	`-saamg_unsym [false]` — Select the unsymmetric version (matrix must be symmetric) `-saamg_theta [0.05\\|0.12]` — The drop criterion ( a_{ij}^2 \leq \theta^2
Crout ILU	`-p {iluc\|8}`	`-iluc_drop [0.05]` — The drop criterion `-iluc_rate [5.0]` — The ratio of the maximum fill-in
ILUT	`-p {ilut\|9}`
Additive Schwarz	`-adds true`	`-adds_iter [1]` — The number of iterations

Additional settings:

Option	Description
`-maxiter [1000]`	The maximum number of iterations
`-tol [1.0e-12]`	The convergence tolerance tol
`-tol_w [1.0]`	The convergence tolerance tol_w
`-print [0]`	The output of the residual history: `-print {none\\|0}` — None `-print {mem\\|1}` — Save the residual history `-print {out\\|2}` — Output to standard output `-print {all\\|3}` — Save the residual history and output it to standard output
`-scale [0]`	The scaling (result overwrites the original matrix and vectors): `-scale {none\\|0}` — No scaling `-scale {jacobi\\|1}` — Jacobi scaling $D^{-1}Ax = D^{-1}b$, where $D$ is the diagonal of $A$ `-scale {symm_diag\\|2}` — Diagonal scaling $D^{-1/2}AD^{-1/2}x = D^{-1/2}b$, where $D^{-1/2}$ has $1/\sqrt{a_{ii}}$ on the diagonal
`-initx_zeros [1]`	The behaviour of the initial vector $x_0$: `-initx_zeros {false\\|0}` — Components given by argument `x` in `lis_solve()` `-initx_zeros {true\\|1}` — All components set to 0
`-conv_cond [0]`	The convergence condition: `-conv_cond {nrm2_r\\|0}` — $\\|b - Ax\\|_2 \le tol \cdot \\|b - Ax_0\\|_2$ `-conv_cond {nrm2_b\\|1}` — $\\|b - Ax\\|_2 \le tol \cdot \\|b\\|_2$ `-conv_cond {nrm1_b\\|2}` — $\\|b - Ax\\|_1 \le tol_w \cdot \\|b\\|_1 + tol$
`-omp_num_threads t`	Number of threads (t represents the maximum number of threads)
`-storage [0]`	The matrix storage format
`-storage_block [2]`	The block size of the BSR and BSC formats
`-f [0]`	Precision of the linear solver: `-f {double\\|0}` — Double precision `-f {quad\\|1}` — Double-double (quadruple) precision

External solver

Use this option to interface external solvers to numgeo. For the external solver, an additional executable is required. This executable has to be placed in the same directory as the numgeo executable.

*Solver, external, exe=$executable name$ [mpi=<n processors> omp=<n threads> oneAPI arg=$....$]

The (sub)keywords are as follows:

external: tells numgeo to use an external executable to solve the linear system.
exe=$executable name$: specify the name of the external executable including file ending.
[mpi=<n processors>]: specify the number of processors to be used. Notice that:
- for mpi=1, the use of mpi is disabled and the external executable is launched using ./executable name. This is the default.
- for mpi=n (with n $>$ 1), the executable is launched using mpirun -n executable name
[omp=<n threads>]: specify the number of (omp) threads to be used. Notice that:
- for [omp=1], the use of omp is disabled. This is the default.
- for [omp=n] (with n $>$ 1), the executable is launched using export OMP_NUM_THREADS=n
[oneAPI]: if present, the oneAPI environment will be initialised before launching the external program
[arg=$....$]: additional arguments that will be passed to the external program. These values are solver/program dependent.

Example 1: Use the AMGCL library to solve the system of equations. For this, an external program numgeo-amgcl.exe is called. Only openMP threading with 16 threads is used. As additional options we use a Ruiz scaling of the matrix. The outputname of the simulation was set to sim-example-1. The numgeo reads:

*Solver, external, exe=$numgeo-amgcl.exe$, omp=16, arg=$-s ruiz -nomp 16$

which produces the following command line call

set OMP_NUM_THREADS=16
.\numgeo-amgcl.exe -f sim-example-1 -s ruiz -nomp 16

Mind the -f sim-example-1. Regardless of any optional arguments [arg=$....$] provided in the input file, numgeo always passes as a first argument the filename of the current simulation starting with a -f.

Existing interfaces to external solvers

Interfaces to some external solvers have already been developed and can be freely downloaded from https://github.com/j-machacek/numgeo-external-solvers:

numgeo-mumps-mpi: MPI version of the MUMPS solver
numgeo-ilupack
numgeo-pastix

Example:
*Solver, external, exe=$numgeo-mumps-mpi$, mpi=4, omp=2

Deactivate solver

Use this option to omit using a solver. However, be aware that in this case the system of equation is not solved at all, i.e. no improvement to the initial guess will be calculated (see Theory Manual). This option should best be used in combination with maxiter=0 (see *Step definition). Be aware of the consequences! To deactivate the solving stage in a step of your analysis, use *Solver, None

Arne De Coninck, Bernard De Baets, Drosos Kourounis, Fabio Verbosio, Olaf Schenk, Steven Maenhout, and Jan Fostier. Needles: toward large-scale genomic prediction with marker-by-environment interaction. Genetics, 203(1):543–555, 3 2016. URL: http://dx.doi.org/10.1534/genetics.115.179887, arXiv:http://www.genetics.org/content/203/1/543.full.pdf, doi:10.1534/genetics.115.179887. ↩
Patrick R. Amestoy, Iain S. Duff, Jean-Yves L'Excellent, and Jacko Koster. A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM Journal on Matrix Analysis and Applications, 23(1):15–41, 1 2001. doi:10.1137/s0895479899358194. ↩
Patrick R. Amestoy, Abdou Guermouche, Jean-Yves L'Excellent, and Stéphane Pralet. Hybrid scheduling for the parallel solution of linear systems. Parallel Computing, 32(2):136–156, 2 2006. doi:10.1016/j.parco.2005.07.004. ↩
Akihiro Pujii, Akira Nishida, and Yoshio Oyanagi. Evaluation of Parallel Aggregate Creation Orders: Smoothed Aggregation Algebraic Multigrid Method. In Michael K. Ng, Andrei Doncescu, Laurence T. Yang, and Tau Leng, editors, High Performance Computational Science and Engineering, volume 172, pages 99–122. Springer-Verlag, 2005. doi:10.1007/0-387-24049-7_6. ↩
H. Kotakemori, H. Hasegawa, and A. Nishida. Performance evaluation of a parallel iterative method library using OpenMP. In Eighth International Conference on High-Performance Computing in Asia-Pacific Region (HPCASIA'05), 5 pp.–436. IEEE, 2005. doi:10.1109/HPCASIA.2005.74. ↩
Hisashi Kotakemori, Hidehiko Hasegawa, Tamito Kajiyama, Akira Nukada, Reiji Suda, and Akira Nishida. Performance Evaluation of Parallel Sparse Matrix–Vector Products on SGI Altix3700. In Matthias S. Mueller, Barbara M. Chapman, Bronis R. De Supinski, Allen D. Malony, and Michael Voss, editors, OpenMP Shared Memory Parallel Programming, volume 4315, pages 153–163. Springer Berlin Heidelberg, 2008. doi:10.1007/978-3-540-68555-5_13. ↩
Akira Nishida. Experience in Developing an Open Source Scalable Software Infrastructure in Japan. In David Taniar, Osvaldo Gervasi, Beniamino Murgante, Eric Pardede, and Bernady O. Apduhan, editors, Computational Science and Its Applications – ICCSA 2010, volume 6017, pages 448–462. Springer Berlin Heidelberg, 2010. doi:10.1007/978-3-642-12165-4_36. ↩
H. A. van der Vorst. Bi-CGSTAB: a fast and smoothly converging variant of bi-CG for the solution of nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing, 13(2):631–644, 1992-03. Publisher: Society for Industrial & Applied Mathematics (SIAM). doi:10.1137/0913035. ↩
Gerard LG Sleijpen and Diederik R Fokkema. BiCGstab (ell) for linear equations involving unsymmetric matrices with complex spectrum. Electronic Transactions on Numerical Analysis., 1:11–32, 1993. Publisher: Kent State University. ↩
Gerard LG Sleijpen, Henk A Van der Vorst, and Diederik R Fokkema. BiCGstab (l) and other hybrid bi-cg methods. Numerical Algorithms, 7(1):75–109, 1994. Publisher: Springer. ↩
Diederik R. Fokkema. Enhanced implementation of BiCGstab(l) for solving linear systems of equations. Citeseer, 1996. ↩