Mirror descent

In mathematics, mirror descent is an iterative optimization algorithm for finding a local minimum of a differentiable function.

It generalizes algorithms such as gradient descent and multiplicative weights.

History

Mirror descent was originally proposed by Nemirovski and Yudin in 1983.^[1]

Motivation

In gradient descent with the sequence of learning rates $(\eta _{n})_{n\geq 0}$ applied to a differentiable function $F$ , one starts with a guess $\mathbf {x} _{0}$ for a local minimum of $F,$ and considers the sequence $\mathbf {x} _{0},\mathbf {x} _{1},\mathbf {x} _{2},\ldots$ such that

\mathbf {x} _{n+1}=\mathbf {x} _{n}-\eta _{n}\nabla F(\mathbf {x} _{n}),\ n\geq 0.

This can be reformulated by noting that

\mathbf {x} _{n+1}=\arg \min _{\mathbf {x} }\left(F(\mathbf {x} _{n})+\nabla F(\mathbf {x} _{n})^{T}(\mathbf {x} -\mathbf {x} _{n})+{\frac {1}{2\eta _{n}}}\|\mathbf {x} -\mathbf {x} _{n}\|^{2}\right)

In other words, $\mathbf {x} _{n+1}$ minimizes the first-order approximation to $F$ at $\mathbf {x} _{n}$ with added proximity term $\|\mathbf {x} -\mathbf {x} _{n}\|^{2}$ .

This squared Euclidean distance term is a particular example of a Bregman distance. Using other Bregman distances will yield other algorithms such as Hedge which may be more suited to optimization over particular geometries.^[2]^[3]

Formulation

We are given convex function $f$ to optimize over a convex set $K\subset \mathbb {R} ^{n}$ , and given some norm $\|\cdot \|$ on $\mathbb {R} ^{n}$ .

We are also given differentiable convex function $h\colon \mathbb {R} ^{n}\to \mathbb {R}$ , $\alpha$ -strongly convex with respect to the given norm. This is called the distance-generating function, and its gradient $\nabla h\colon \mathbb {R} ^{n}\to \mathbb {R} ^{n}$ is known as the mirror map.

Starting from initial $x_{0}\in K$ , in each iteration of Mirror Descent:

Map to the dual space: $\theta _{t}\leftarrow \nabla h(x_{t})$
Update in the dual space using a gradient step: $\theta _{t+1}\leftarrow \theta _{t}-\eta _{t}\nabla f(x_{t})$
Map back to the primal space: $x'_{t+1}\leftarrow (\nabla h)^{-1}(\theta _{t+1})$
Project back to the feasible region $K$ : $x_{t+1}\leftarrow \mathrm {arg} \min _{x\in K}D_{h}(x||x'_{t+1})$ , where $D_{h}$ is the Bregman divergence.

Extensions

Mirror descent in the online optimization setting is known as Online Mirror Descent (OMD).^[4]

References

^ Arkadi Nemirovsky and David Yudin. Problem Complexity and Method Efficiency in Optimization. John Wiley & Sons, 1983
^ Nemirovski, Arkadi (2012) Tutorial: mirror descent algorithms for large-scale deterministic and stochastic convex optimization.https://www2.isye.gatech.edu/~nemirovs/COLT2012Tut.pdf
^ "Mirror descent algorithm". tlienart.github.io. Retrieved 2022-07-10.
^ Fang, Huang; Harvey, Nicholas J. A.; Portella, Victor S.; Friedlander, Michael P. (2021-09-03). "Online mirror descent and dual averaging: keeping pace in the dynamic case". arXiv:2006.02585 [cs.LG].

Optimization: Algorithms, methods, and heuristics

Unconstrained nonlinear

Functions

Gradients

Convergence	Trust region Wolfe conditions
Quasi–Newton	Berndt–Hall–Hall–Hausman Broyden–Fletcher–Goldfarb–Shanno and L-BFGS Davidon–Fletcher–Powell Symmetric rank-one (SR1)
Other methods	Conjugate gradient Gauss–Newton Gradient Mirror Levenberg–Marquardt Powell's dog leg method Truncated Newton

Hessians

Newton's method

Constrained nonlinear

General	Barrier methods Penalty methods
Differentiable	Augmented Lagrangian methods Sequential quadratic programming Successive linear programming

Convex optimization

Convex
minimization

Linear and
quadratic

Interior point	Affine scaling Ellipsoid algorithm of Khachiyan Projective algorithm of Karmarkar
Basis-exchange	Simplex algorithm of Dantzig Revised simplex algorithm Criss-cross algorithm Principal pivoting algorithm of Lemke