Biconjugate gradient method

In mathematics, more specifically in numerical linear algebra, the biconjugate gradient method is an algorithm to solve systems of linear equations

A x = b . {\displaystyle Ax=b.\,}

Unlike the conjugate gradient method, this algorithm does not require the matrix A {\displaystyle A} to be self-adjoint, but instead one needs to perform multiplications by the conjugate transpose A*.

The Algorithm

  1. Choose initial guess x 0 {\displaystyle x_{0}\,} , two other vectors x 0 {\displaystyle x_{0}^{*}} and b {\displaystyle b^{*}\,} and a preconditioner M {\displaystyle M\,}
  2. r 0 b A x 0 {\displaystyle r_{0}\leftarrow b-A\,x_{0}\,}
  3. r 0 b x 0 A {\displaystyle r_{0}^{*}\leftarrow b^{*}-x_{0}^{*}\,A^{*}}
  4. p 0 M 1 r 0 {\displaystyle p_{0}\leftarrow M^{-1}r_{0}\,}
  5. p 0 r 0 M 1 {\displaystyle p_{0}^{*}\leftarrow r_{0}^{*}M^{-1}\,}
  6. for k = 0 , 1 , {\displaystyle k=0,1,\ldots } do
    1. α k r k M 1 r k p k A p k {\displaystyle \alpha _{k}\leftarrow {r_{k}^{*}M^{-1}r_{k} \over p_{k}^{*}Ap_{k}}\,}
    2. x k + 1 x k + α k p k {\displaystyle x_{k+1}\leftarrow x_{k}+\alpha _{k}\cdot p_{k}\,}
    3. x k + 1 x k + α k ¯ p k {\displaystyle x_{k+1}^{*}\leftarrow x_{k}^{*}+{\overline {\alpha _{k}}}\cdot p_{k}^{*}\,}
    4. r k + 1 r k α k A p k {\displaystyle r_{k+1}\leftarrow r_{k}-\alpha _{k}\cdot Ap_{k}\,}
    5. r k + 1 r k α k ¯ p k A {\displaystyle r_{k+1}^{*}\leftarrow r_{k}^{*}-{\overline {\alpha _{k}}}\cdot p_{k}^{*}\,A^{*}}
    6. β k r k + 1 M 1 r k + 1 r k M 1 r k {\displaystyle \beta _{k}\leftarrow {r_{k+1}^{*}M^{-1}r_{k+1} \over r_{k}^{*}M^{-1}r_{k}}\,}
    7. p k + 1 M 1 r k + 1 + β k p k {\displaystyle p_{k+1}\leftarrow M^{-1}r_{k+1}+\beta _{k}\cdot p_{k}\,}
    8. p k + 1 r k + 1 M 1 + β k ¯ p k {\displaystyle p_{k+1}^{*}\leftarrow r_{k+1}^{*}M^{-1}+{\overline {\beta _{k}}}\cdot p_{k}^{*}\,}

In the above formulation, the computed r k {\displaystyle r_{k}\,} and r k {\displaystyle r_{k}^{*}} satisfy

r k = b A x k , {\displaystyle r_{k}=b-Ax_{k},\,}
r k = b x k A {\displaystyle r_{k}^{*}=b^{*}-x_{k}^{*}\,A^{*}}

and thus are the respective residuals corresponding to x k {\displaystyle x_{k}\,} and x k {\displaystyle x_{k}^{*}} , as approximate solutions to the systems

A x = b , {\displaystyle Ax=b,\,}
x A = b ; {\displaystyle x^{*}\,A^{*}=b^{*}\,;}

x {\displaystyle x^{*}} is the adjoint, and α ¯ {\displaystyle {\overline {\alpha }}} is the complex conjugate.

Unpreconditioned version of the algorithm

  1. Choose initial guess x 0 {\displaystyle x_{0}\,} ,
  2. r 0 b A x 0 {\displaystyle r_{0}\leftarrow b-A\,x_{0}\,}
  3. r ^ 0 b ^ x ^ 0 A {\displaystyle {\hat {r}}_{0}\leftarrow {\hat {b}}-{\hat {x}}_{0}A}
  4. p 0 r 0 {\displaystyle p_{0}\leftarrow r_{0}\,}
  5. p ^ 0 r ^ 0 {\displaystyle {\hat {p}}_{0}\leftarrow {\hat {r}}_{0}\,}
  6. for k = 0 , 1 , {\displaystyle k=0,1,\ldots } do
    1. α k r ^ k r k p ^ k A p k {\displaystyle \alpha _{k}\leftarrow {{\hat {r}}_{k}r_{k} \over {\hat {p}}_{k}Ap_{k}}\,}
    2. x k + 1 x k + α k p k {\displaystyle x_{k+1}\leftarrow x_{k}+\alpha _{k}\cdot p_{k}\,}
    3. x ^ k + 1 x ^ k + α k p ^ k {\displaystyle {\hat {x}}_{k+1}\leftarrow {\hat {x}}_{k}+\alpha _{k}\cdot {\hat {p}}_{k}\,}
    4. r k + 1 r k α k A p k {\displaystyle r_{k+1}\leftarrow r_{k}-\alpha _{k}\cdot Ap_{k}\,}
    5. r ^ k + 1 r ^ k α k p ^ k A {\displaystyle {\hat {r}}_{k+1}\leftarrow {\hat {r}}_{k}-\alpha _{k}\cdot {\hat {p}}_{k}A}
    6. β k r ^ k + 1 r k + 1 r ^ k r k {\displaystyle \beta _{k}\leftarrow {{\hat {r}}_{k+1}r_{k+1} \over {\hat {r}}_{k}r_{k}}\,}
    7. p k + 1 r k + 1 + β k p k {\displaystyle p_{k+1}\leftarrow r_{k+1}+\beta _{k}\cdot p_{k}\,}
    8. p ^ k + 1 r ^ k + 1 + β k p ^ k {\displaystyle {\hat {p}}_{k+1}\leftarrow {\hat {r}}_{k+1}+\beta _{k}\cdot {\hat {p}}_{k}\,}

Discussion

The biconjugate gradient method is numerically unstable[citation needed] (compare to the biconjugate gradient stabilized method), but very important from a theoretical point of view. Define the iteration steps by

x k := x j + P k A 1 ( b A x j ) , {\displaystyle x_{k}:=x_{j}+P_{k}A^{-1}\left(b-Ax_{j}\right),}
x k := x j + ( b x j A ) P k A 1 , {\displaystyle x_{k}^{*}:=x_{j}^{*}+\left(b^{*}-x_{j}^{*}A\right)P_{k}A^{-1},}

where j < k {\displaystyle j<k} using the related projection

P k := u k ( v k A u k ) 1 v k A , {\displaystyle P_{k}:=\mathbf {u} _{k}\left(\mathbf {v} _{k}^{*}A\mathbf {u} _{k}\right)^{-1}\mathbf {v} _{k}^{*}A,}

with

u k = [ u 0 , u 1 , , u k 1 ] , {\displaystyle \mathbf {u} _{k}=\left[u_{0},u_{1},\dots ,u_{k-1}\right],}
v k = [ v 0 , v 1 , , v k 1 ] . {\displaystyle \mathbf {v} _{k}=\left[v_{0},v_{1},\dots ,v_{k-1}\right].}

These related projections may be iterated themselves as

P k + 1 = P k + ( 1 P k ) u k v k A ( 1 P k ) v k A ( 1 P k ) u k . {\displaystyle P_{k+1}=P_{k}+\left(1-P_{k}\right)u_{k}\otimes {v_{k}^{*}A\left(1-P_{k}\right) \over v_{k}^{*}A\left(1-P_{k}\right)u_{k}}.}

A relation to Quasi-Newton methods is given by P k = A k 1 A {\displaystyle P_{k}=A_{k}^{-1}A} and x k + 1 = x k A k + 1 1 ( A x k b ) {\displaystyle x_{k+1}=x_{k}-A_{k+1}^{-1}\left(Ax_{k}-b\right)} , where

A k + 1 1 = A k 1 + ( 1 A k 1 A ) u k v k ( 1 A A k 1 ) v k A ( 1 A k 1 A ) u k . {\displaystyle A_{k+1}^{-1}=A_{k}^{-1}+\left(1-A_{k}^{-1}A\right)u_{k}\otimes {v_{k}^{*}\left(1-AA_{k}^{-1}\right) \over v_{k}^{*}A\left(1-A_{k}^{-1}A\right)u_{k}}.}

The new directions

p k = ( 1 P k ) u k , {\displaystyle p_{k}=\left(1-P_{k}\right)u_{k},}
p k = v k A ( 1 P k ) A 1 {\displaystyle p_{k}^{*}=v_{k}^{*}A\left(1-P_{k}\right)A^{-1}}

are then orthogonal to the residuals:

v i r k = p i r k = 0 , {\displaystyle v_{i}^{*}r_{k}=p_{i}^{*}r_{k}=0,}
r k u j = r k p j = 0 , {\displaystyle r_{k}^{*}u_{j}=r_{k}^{*}p_{j}=0,}

which themselves satisfy

r k = A ( 1 P k ) A 1 r j , {\displaystyle r_{k}=A\left(1-P_{k}\right)A^{-1}r_{j},}
r k = r j ( 1 P k ) {\displaystyle r_{k}^{*}=r_{j}^{*}\left(1-P_{k}\right)}

where i , j < k {\displaystyle i,j<k} .

The biconjugate gradient method now makes a special choice and uses the setting

u k = M 1 r k , {\displaystyle u_{k}=M^{-1}r_{k},\,}
v k = r k M 1 . {\displaystyle v_{k}^{*}=r_{k}^{*}\,M^{-1}.\,}

With this particular choice, explicit evaluations of P k {\displaystyle P_{k}} and A−1 are avoided, and the algorithm takes the form stated above.

Properties

  • If A = A {\displaystyle A=A^{*}\,} is self-adjoint, x 0 = x 0 {\displaystyle x_{0}^{*}=x_{0}} and b = b {\displaystyle b^{*}=b} , then r k = r k {\displaystyle r_{k}=r_{k}^{*}} , p k = p k {\displaystyle p_{k}=p_{k}^{*}} , and the conjugate gradient method produces the same sequence x k = x k {\displaystyle x_{k}=x_{k}^{*}} at half the computational cost.
  • The sequences produced by the algorithm are biorthogonal, i.e., p i A p j = r i M 1 r j = 0 {\displaystyle p_{i}^{*}Ap_{j}=r_{i}^{*}M^{-1}r_{j}=0} for i j {\displaystyle i\neq j} .
  • if P j {\displaystyle P_{j'}\,} is a polynomial with deg ( P j ) + j < k {\displaystyle \deg \left(P_{j'}\right)+j<k} , then r k P j ( M 1 A ) u j = 0 {\displaystyle r_{k}^{*}P_{j'}\left(M^{-1}A\right)u_{j}=0} . The algorithm thus produces projections onto the Krylov subspace.
  • if P i {\displaystyle P_{i'}\,} is a polynomial with i + deg ( P i ) < k {\displaystyle i+\deg \left(P_{i'}\right)<k} , then v i P i ( A M 1 ) r k = 0 {\displaystyle v_{i}^{*}P_{i'}\left(AM^{-1}\right)r_{k}=0} .

See also

References

  • Fletcher, R. (1976). Watson, G. Alistair (ed.). "Conjugate gradient methods for indefinite systems". Numerical Analysis. Lecture Notes in Mathematics. 506. Springer Berlin / Heidelberg: 73–89. doi:10.1007/BFb0080109. ISBN 978-3-540-07610-0. ISSN 1617-9692.
  • Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007). "Section 2.7.6". Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN 978-0-521-88068-8.