Accelerators

In nonlinear finite element analysis, iterative solution schemes such as the Newton-Raphson method may converge slowly or fail to converge altogether when the system exhibits strong nonlinearities. Solution accelerators improve the convergence behaviour of the iterative process without requiring a change to the underlying Jacobian. numgeo provides two fundamentally different acceleration strategies:

Line search, which scales the magnitude of the iterative correction along its original direction to minimise the energy potential, and
Anderson acceleration, which combines information from successive iterations to modify both the magnitude and the direction of the update.

Both approaches are selected via the type parameter on the *Accelerator keyword (previsously *Line search).

Line search

The rationale behind Line Search is that the direction found by the Newton-Raphson method is often a good direction, but the step size (magnitude of the solution) is not. Furthermore, it is cheaper to compute the residual for several points along \(\boldsymbol{c}\) rather than form and factor a new system Jacobian.

The Line Search algorithm uses a prediction of the iterative solution increment \(\boldsymbol{c}^i\) as obtained by the Newton-Raphson algorithm and scales this vector by a value to minimize the energy potential¹. \(\Pi = \boldsymbol{r}^T \boldsymbol{c}^{(i)}\). Note that only the magnitude is scaled, the direction of the prediction remains unchanged. While the local minimum of the energy potential represents the equilibrium, the minimum in the line search direction can be regarded as the best solution in the predicted direction. The scaled iterative increment reads:

\[ \Delta \boldsymbol{d}^{i+1} = \Delta \boldsymbol{d}^i + \lambda \boldsymbol{c}^{i+1} \]

where \(\lambda\) is a scalar scaling variable. For \(\lambda > 1.0\) an extrapolation is performed. The scaling factor is bound to \(\lambda_{min} \leq \lambda \leq \lambda_{max}\), per default numgeo uses \(\lambda_{min}=0.25\) and \(\lambda_{max}=1.0\) (no extrapolation). A minimum of \(\Pi\) in the line search direction requires that the derivative of \(\Pi\) to \(\eta\) must be zero:

\[ s(\lambda) = \frac{\partial \Pi}{\partial \lambda} = \frac{\partial \Pi}{\partial \mathbf{d}} \frac{\partial \mathbf{d}}{\partial \lambda} = \boldsymbol{r}(\lambda) \boldsymbol{c}^{i+1} = 0 \]

The above can be interpreted as follows: at the minimum, the residual \(\boldsymbol{r}\) is orthogonal to the direction \(\boldsymbol{c}\). Equation (2) can be solved by iterative refinement of \(\lambda\). In numgeo, the purpose of the Line Search is to accelerate the Newton-Raphson method or to ''help'' finding convergence where none would be achieved otherwise.

Relaxation

The simplest form of step-size control applies a constant, user-defined scaling factor \(\lambda\) to every correction:

\[ \Delta \boldsymbol{d}^{i+1} = \Delta \boldsymbol{d}^i + \lambda \, \boldsymbol{c}^{i+1} \]

Setting \(\lambda < 1\) produces under-relaxation, which can stabilise iterations in strongly nonlinear problems at the cost of slower convergence.

Usage

Relaxation is selected with Type = Relaxation. It is the cheapest option and requires no additional residual evaluations. It is most useful when divergence is caused by overshoot rather than a poor search direction.

Linear Line Search

In its simplest form a linear relation between the potential at the beginning and the end of the present increment is assumed. The residual for \(\lambda=0\) is known from the previous increment, and the residual for \(\lambda=1\) is known from the present increment. Assuming a linear relation in between yields the value of \(\lambda\) without extra calculations:

\[ \lambda = - \dfrac{\boldsymbol{r}^T_0 \boldsymbol{c}^{(i)}}{\boldsymbol{r}^T \boldsymbol{c}^{(i)} - \boldsymbol{r}^T_0 \boldsymbol{c}^{(i)}} \]

where \(\boldsymbol{r}_0\) is the residual before the correction was applied and \(\boldsymbol{r}\) is the residual after a full correction \(\lambda = 1\). Evaluating \(\boldsymbol{r}\) at \(\lambda = 1\) requires one additional residual assembly, which makes this method more expensive per iteration than the other variants.

Usage

Linear Line Search is selected with Type = Linear. Because it requires one extra residual assembly per iteration, it should only be used when the cost of an additional assembly is small relative to the expected reduction in iteration count.

Multi-field simulations

In simulations involving multiple active physical fields, such as a consolidation analysis using coupled two-phase elements, the scaling factor \(\lambda\) is calculated separately for each active field.

Back-looking Line Search

The back-looking line search avoids the extra residual assembly of the linear method by re-using information that is already available from the previous iteration. At iteration \(k\), the residuals \(\boldsymbol{r}_{k-1}\) and \(\boldsymbol{r}_k\) are both known, as is the previous correction \(\boldsymbol{c}_{k-1}\). Assuming the same linear model as in the linear line search but applied to quantities from the previous step, the scaling factor is computed as:

\[ \lambda = - \dfrac{\boldsymbol{r}^T_{k-1} \, \boldsymbol{c}^{(k-1)}}{\boldsymbol{r}^T_k \, \boldsymbol{c}^{(k-1)} - \boldsymbol{r}^T_{k-1} \, \boldsymbol{c}^{(k-1)}} \]

This \(\lambda\) is mathematically the optimal scaling for the previous correction \(\boldsymbol{c}_{k-1}\). It is then applied to the current correction \(\boldsymbol{c}_k\) under the assumption that successive corrections are approximately parallel, which is a reasonable approximation in modified Newton-Raphson iterations where the Jacobian is held constant.

Usage

Back-looking Line Search is selected with Type = Back-Looking. It requires no additional residual evaluations and is therefore as cheap as relaxation, while adapting the scaling factor to the current state of the iteration.

Multi-field simulations

As with the linear line search, the scaling factor \(\lambda\) is calculated separately for each active degree-of-freedom field (displacements, pore water pressure, volumetric Jacobian \(J\)).

Limitations

The back-looking approach is most effective when successive corrections point in similar directions. Near the formation of a failure mechanism, where the plastic zone evolves between iterations, the direction of \(\boldsymbol{c}\) can change substantially and the transferred scaling factor becomes less reliable. In such situations, Anderson acceleration may be more effective.

Anderson acceleration

When the global Jacobian used in the Newton-Raphson iteration is not updated at every iteration or changes only mildly (for instance when an elastic tangent is used in place of the consistent elasto-plastic tangent) the iteration reduces to a fixed-point iteration of the form:

\[ \boldsymbol{u}^{(k+1)} = \boldsymbol{u}^{(k)} + \boldsymbol{c}^{(k)}, \qquad \boldsymbol{c}^{(k)} = -\mathbf{K}_e^{-1}\,\boldsymbol{r}\!\left(\boldsymbol{u}^{(k)}\right) \]

where \(\mathbf{K}_e\) is the (constant) elastic stiffness matrix and \(\boldsymbol{r}\) is the out-of-balance force vector. Such a scheme converges at best linearly with rate

\[ \rho = \left\|\mathbf{I} - \mathbf{K}_e^{-1}\mathbf{K}_\mathrm{ep}\right\| \]

where \(\mathbf{K}_\mathrm{ep}\) is the (unknown) elasto-plastic tangent. As plasticity spreads and \(\mathbf{K}_\mathrm{ep}\) softens, \(\rho\) approaches unity and convergence becomes arbitrarily slow. This is particularly problematic in strength reduction analyses (safety factor analyses), where the global failure mechanism causes \(\rho \to 1\) at the critical load level.

Anderson acceleration² (also known as Anderson mixing) is a general technique for accelerating the convergence of fixed-point iterations. It achieves this by forming the next iterate not from a single correction, but from a linear combination of the most recent corrections, chosen to minimize the residual in a least-squares sense. Unlike line search, which only scales the magnitude of the correction, Anderson acceleration modifies both the magnitude and the direction of the update.

numgeo implements Anderson acceleration at depths \(m=1\) and \(m=2\). The depth controls how many previous iteration pairs are used to construct the accelerated update. Higher depth can capture more spectral modes of the iteration operator and is therefore more effective when the dominant eigenspace of \(\mathbf{I} - \mathbf{K}_e^{-1}\mathbf{K}_\mathrm{ep}\) is not well approximated by a single direction.

Algorithm (depth-1)

At depth \(m=1\), only the current and one previous iteration are required. At iteration \(k \geq 1\) the following quantities are available:

Symbol	Description
\(\boldsymbol{u}^{(k)}\)	current cumulative solution increment
\(\boldsymbol{c}^{(k)}\)	current correction from the linear solver
\(\boldsymbol{u}^{(k-1)}\)	cumulative solution increment from the previous iteration
\(\boldsymbol{c}^{(k-1)}\)	correction from the previous iteration

Define the differences

\[ \Delta\boldsymbol{u} = \boldsymbol{u}^{(k)} - \boldsymbol{u}^{(k-1)}, \qquad \Delta\boldsymbol{c} = \boldsymbol{c}^{(k)} - \boldsymbol{c}^{(k-1)} \]

The mixing parameter \(\alpha\) is determined by minimising \(\|\boldsymbol{c}^{(k)} - \alpha\,\Delta\boldsymbol{c}\|^2\), which yields:

\[ \alpha = \frac{\Delta\boldsymbol{c}^T\,\boldsymbol{c}^{(k)}}{\Delta\boldsymbol{c}^T\,\Delta\boldsymbol{c}} \]

The accelerated update then reads:

\[ \boldsymbol{u}^{(k+1)} = \boldsymbol{u}^{(k)} + \boldsymbol{c}^{(k)} - \alpha\left(\Delta\boldsymbol{u} + \Delta\boldsymbol{c}\right) \]

For \(\alpha = 0\) this reduces to the standard modified Newton-Raphson step. When the fixed-point map is affine (i.e. when the Jacobian is exactly constant), the least-squares minimisation is exact and depth-1 Anderson acceleration is equivalent to a secant update that projects the iterate to the root of the plane spanned by the last two residuals³.

Algorithm (depth-2)

At depth \(m=2\), the two most recent previous iterations are used. At iteration \(k \geq 2\) the additional quantities \(\boldsymbol{u}^{(k-2)}\) and \(\boldsymbol{c}^{(k-2)}\) are available. Define two sets of differences:

\[ \Delta\boldsymbol{u}_1 = \boldsymbol{u}^{(k)} - \boldsymbol{u}^{(k-1)}, \qquad \Delta\boldsymbol{c}_1 = \boldsymbol{c}^{(k)} - \boldsymbol{c}^{(k-1)} \]

\[ \Delta\boldsymbol{u}_2 = \boldsymbol{u}^{(k-1)} - \boldsymbol{u}^{(k-2)}, \qquad \Delta\boldsymbol{c}_2 = \boldsymbol{c}^{(k-1)} - \boldsymbol{c}^{(k-2)} \]

The two mixing parameters \(\gamma_1\), \(\gamma_2\) are determined by minimising \(\|\boldsymbol{c}^{(k)} - \gamma_1\,\Delta\boldsymbol{c}_1 - \gamma_2\,\Delta\boldsymbol{c}_2\|^2\), which leads to the \(2\times 2\) normal equations:

\[ \begin{bmatrix} \Delta\boldsymbol{c}_1^T\,\Delta\boldsymbol{c}_1 & \Delta\boldsymbol{c}_1^T\,\Delta\boldsymbol{c}_2 \\[4pt] \Delta\boldsymbol{c}_2^T\,\Delta\boldsymbol{c}_1 & \Delta\boldsymbol{c}_2^T\,\Delta\boldsymbol{c}_2 \end{bmatrix} \begin{bmatrix} \gamma_1 \\[4pt] \gamma_2 \end{bmatrix} = \begin{bmatrix} \Delta\boldsymbol{c}_1^T\,\boldsymbol{c}^{(k)} \\[4pt] \Delta\boldsymbol{c}_2^T\,\boldsymbol{c}^{(k)} \end{bmatrix} \]

The accelerated update then reads:

\[ \boldsymbol{u}^{(k+1)} = \boldsymbol{u}^{(k)} + \boldsymbol{c}^{(k)} - \gamma_1\left(\Delta\boldsymbol{u}_1 + \Delta\boldsymbol{c}_1\right) - \gamma_2\left(\Delta\boldsymbol{u}_2 + \Delta\boldsymbol{c}_2\right) \]

Depth-2 can capture two independent modes of the iteration operator simultaneously, which is beneficial when the emerging failure mechanism involves more than one dominant deformation pattern. On an affine fixed-point map, depth-2 Anderson acceleration is equivalent to GMRES(2) applied to the linearized residual equation³.

Automatic fallback

At iteration \(k=1\) of an increment, only one previous pair is available and depth-2 automatically falls back to depth-1. If the \(2\times 2\) Gram matrix becomes (near-)singular, indicating that the two correction differences are approximately parallel, the algorithm also falls back to depth-1.

Safeguards

The mixing parameters are clamped to \(|\alpha| \leq \alpha_\mathrm{max}\) (depth-1) and \(|\gamma_i| \leq \alpha_\mathrm{max}\) (depth-2), where \(\alpha_\mathrm{max}\) is controlled by the LAMBDA_MAX parameter on the *Acceleration keyword. This prevents over-extrapolation when the least-squares problem is poorly conditioned.

If \(\|\Delta\boldsymbol{c}\|^2\) (depth-1) or the determinant of the Gram matrix (depth-2) falls below a numerical tolerance, indicating that the corrections have not changed between iterations, acceleration is skipped and a plain modified Newton-Raphson step is taken.

The stored history is cleared at the start of every new increment. In strength reduction analyses this means that the acceleration restarts for each new factor-of-safety level, which is necessary because the constitutive parameters (and therefore the fixed-point map) change between increments.

First iteration

At iteration \(k=0\) of each increment, no previous history is available. numgeo performs a standard unaccelerated step and stores \(\boldsymbol{u}^{(0)}\) and \(\boldsymbol{c}^{(0)}\) for use at \(k=1\).

Not combinable with line search

Anderson acceleration and line search are mutually exclusive. The Type parameter on *Accelerator selects one or the other. Running both simultaneously is not supported, because Anderson acceleration assumes that the correction \(\boldsymbol{c}^{(k)}\) has not been rescaled by a line search factor.

Applicability

Anderson acceleration is most effective in problems where the Jacobian is held constant (or nearly constant) across iterations, such as modified Newton-Raphson with an elastic tangent. It is particularly well-suited for strength reduction (safety factor) analyses, where the iteration count near the critical load level can be reduced substantially. It can also be used in standard implicit analyses, though the benefit is smaller when the Jacobian already approximates the consistent tangent.

Reference Manual__

How to use it

Strictly speaking ''energy potential'' is the correct terminology for the physical behavior, e.g. in case of plastic deformation. However, this poses no problem for the algorithmic implementation within an increment. ↩
Anderson, D.G. (1965). Iterative procedures for nonlinear integral equations. Journal of the ACM, 12(4), 547–560. ↩
Walker, H.F. and Ni, P. (2011). Anderson acceleration for fixed-point iterations. SIAM Journal on Numerical Analysis, 49(4), 1715–1735. ↩↩