Stress majorization

Geometric placement based on ideal distances

Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of n {\displaystyle n} m {\displaystyle m} -dimensional data items, a configuration X {\displaystyle X} of n {\displaystyle n} points in r {\displaystyle r} ( m ) {\displaystyle (\ll m)} -dimensional space is sought that minimizes the so-called stress function σ ( X ) {\displaystyle \sigma (X)} . Usually r {\displaystyle r} is 2 {\displaystyle 2} or 3 {\displaystyle 3} , i.e. the ( n × r ) {\displaystyle (n\times r)} matrix X {\displaystyle X} lists points in 2 {\displaystyle 2-} or 3 {\displaystyle 3-} dimensional Euclidean space so that the result may be visualised (i.e. an MDS plot). The function σ {\displaystyle \sigma } is a cost or loss function that measures the squared differences between ideal ( m {\displaystyle m} -dimensional) distances and actual distances in r-dimensional space. It is defined as:

σ ( X ) = i < j n w i j ( d i j ( X ) δ i j ) 2 {\displaystyle \sigma (X)=\sum _{i<j\leq n}w_{ij}(d_{ij}(X)-\delta _{ij})^{2}}

where w i j 0 {\displaystyle w_{ij}\geq 0} is a weight for the measurement between a pair of points ( i , j ) {\displaystyle (i,j)} , d i j ( X ) {\displaystyle d_{ij}(X)} is the euclidean distance between i {\displaystyle i} and j {\displaystyle j} and δ i j {\displaystyle \delta _{ij}} is the ideal distance between the points (their separation) in the m {\displaystyle m} -dimensional data space. Note that w i j {\displaystyle w_{ij}} can be used to specify a degree of confidence in the similarity between points (e.g. 0 can be specified if there is no information for a particular pair).

A configuration X {\displaystyle X} which minimizes σ ( X ) {\displaystyle \sigma (X)} gives a plot in which points that are close together correspond to points that are also close together in the original m {\displaystyle m} -dimensional data space.

There are many ways that σ ( X ) {\displaystyle \sigma (X)} could be minimized. For example, Kruskal[1] recommended an iterative steepest descent approach. However, a significantly better (in terms of guarantees on, and rate of, convergence) method for minimizing stress was introduced by Jan de Leeuw.[2] De Leeuw's iterative majorization method at each step minimizes a simple convex function which both bounds σ {\displaystyle \sigma } from above and touches the surface of σ {\displaystyle \sigma } at a point Z {\displaystyle Z} , called the supporting point. In convex analysis such a function is called a majorizing function. This iterative majorization process is also referred to as the SMACOF algorithm ("Scaling by MAjorizing a COmplicated Function").

The SMACOF algorithm

The stress function σ {\displaystyle \sigma } can be expanded as follows:

σ ( X ) = i < j n w i j ( d i j ( X ) δ i j ) 2 = i < j w i j δ i j 2 + i < j w i j d i j 2 ( X ) 2 i < j w i j δ i j d i j ( X ) {\displaystyle \sigma (X)=\sum _{i<j\leq n}w_{ij}(d_{ij}(X)-\delta _{ij})^{2}=\sum _{i<j}w_{ij}\delta _{ij}^{2}+\sum _{i<j}w_{ij}d_{ij}^{2}(X)-2\sum _{i<j}w_{ij}\delta _{ij}d_{ij}(X)}

Note that the first term is a constant C {\displaystyle C} and the second term is quadratic in X {\displaystyle X} (i.e. for the Hessian matrix V {\displaystyle V} the second term is equivalent to tr X V X {\displaystyle X'VX} ) and therefore relatively easily solved. The third term is bounded by:

i < j w i j δ i j d i j ( X ) = tr X B ( X ) X tr X B ( Z ) Z {\displaystyle \sum _{i<j}w_{ij}\delta _{ij}d_{ij}(X)=\,\operatorname {tr} \,X'B(X)X\geq \,\operatorname {tr} \,X'B(Z)Z}

where B ( Z ) {\displaystyle B(Z)} has:

b i j = w i j δ i j d i j ( Z ) {\displaystyle b_{ij}=-{\frac {w_{ij}\delta _{ij}}{d_{ij}(Z)}}} for d i j ( Z ) 0 , i j {\displaystyle d_{ij}(Z)\neq 0,i\neq j}

and b i j = 0 {\displaystyle b_{ij}=0} for d i j ( Z ) = 0 , i j {\displaystyle d_{ij}(Z)=0,i\neq j}

and b i i = j = 1 , j i n b i j {\displaystyle b_{ii}=-\sum _{j=1,j\neq i}^{n}b_{ij}} .

Proof of this inequality is by the Cauchy-Schwarz inequality, see Borg[3] (pp. 152–153).

Thus, we have a simple quadratic function τ ( X , Z ) {\displaystyle \tau (X,Z)} that majorizes stress:

σ ( X ) = C + tr X V X 2 tr X B ( X ) X {\displaystyle \sigma (X)=C+\,\operatorname {tr} \,X'VX-2\,\operatorname {tr} \,X'B(X)X}
C + tr X V X 2 tr X B ( Z ) Z = τ ( X , Z ) {\displaystyle \leq C+\,\operatorname {tr} \,X'VX-2\,\operatorname {tr} \,X'B(Z)Z=\tau (X,Z)}


The iterative minimization procedure is then:

  • at the k t h {\displaystyle k^{th}} step we set Z X k 1 {\displaystyle Z\leftarrow X^{k-1}}
  • X k min X τ ( X , Z ) {\displaystyle X^{k}\leftarrow \min _{X}\tau (X,Z)}
  • stop if σ ( X k 1 ) σ ( X k ) < ϵ {\displaystyle \sigma (X^{k-1})-\sigma (X^{k})<\epsilon } otherwise repeat.

This algorithm has been shown to decrease stress monotonically (see de Leeuw[2]).

Use in graph drawing

Stress majorization and algorithms similar to SMACOF also have application in the field of graph drawing.[4][5] That is, one can find a reasonably aesthetically appealing layout for a network or graph by minimizing a stress function over the positions of the nodes in the graph. In this case, the δ i j {\displaystyle \delta _{ij}} are usually set to the graph-theoretic distances between nodes i {\displaystyle i} and j {\displaystyle j} and the weights w i j {\displaystyle w_{ij}} are taken to be δ i j α {\displaystyle \delta _{ij}^{-\alpha }} . Here, α {\displaystyle \alpha } is chosen as a trade-off between preserving long- or short-range ideal distances. Good results have been shown for α = 2 {\displaystyle \alpha =2} .[6]

References

  1. ^ Kruskal, J. B. (1964), "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis", Psychometrika, 29 (1): 1–27, doi:10.1007/BF02289565.
  2. ^ a b de Leeuw, J. (1977), "Applications of convex analysis to multidimensional scaling", in Barra, J. R.; Brodeau, F.; Romie, G.; et al. (eds.), Recent developments in statistics, pp. 133–145.
  3. ^ Borg, I.; Groenen, P. (1997), Modern Multidimensional Scaling: theory and applications, New York: Springer-Verlag.
  4. ^ Michailidis, G.; de Leeuw, J. (2001), "Data visualization through graph drawing", Computation Stat., 16 (3): 435–450, CiteSeerX 10.1.1.28.9372, doi:10.1007/s001800100077.
  5. ^ Gansner, E.; Koren, Y.; North, S. (2004), "Graph Drawing by Stress Majorization", Proceedings of 12th Int. Symp. Graph Drawing (GD'04), Lecture Notes in Computer Science, vol. 3383, Springer-Verlag, pp. 239–250.
  6. ^ Cohen, J. (1997), "Drawing graphs to convey proximity: an incremental arrangement method", ACM Transactions on Computer-Human Interaction, 4 (3): 197–229, doi:10.1145/264645.264657.