Operators and Keywords

Function List:


Function File: Y = cmdscale (D)
Function File: [Y, e ] = cmdscale (D)

Classical multidimensional scaling of a matrix.

Takes an n by n distance (or difference, similarity, or dissimilarity) matrix D. Returns Y, a matrix of n points with coordinates in p dimensional space which approximate those distances (or differences, similarities, or dissimilarities). Also returns the eigenvalues e of B = -1/2 * J * (D.^2) * J, where J = eye(n) - ones(n,n)/n. p, the number of columns of Y, is equal to the number of positive real eigenvalues of B.

D can be a full or sparse matrix or a vector of length n*(n-1)/2 containing the upper triangular elements (like the output of the pdist function). It must be symmetric with non-negative entries whose values are further restricted by the type of matrix being represented:

* If D is either a distance, dissimilarity, or difference matrix, then it must have zero entries along the main diagonal. In this case the points Y equal or approximate the distances given by D.

* If D is a similarity matrix, the elements must all be less than or equal to one, with ones along the the main diagonal. In this case the points Y equal or approximate the distances given by D = sqrt(ones(n,n)-D).

D is a Euclidean matrix if and only if B is positive semi-definite. When this is the case, then Y is an exact representation of the distances given in D. If D is non-Euclidean, Y only approximates the distance given in D. The approximation used by cmdscale minimizes the statistical loss function known as strain.

The returned Y is an n by p matrix showing possible coordinates of the points in p dimensional space (p < n). The columns are correspond to the positive eigenvalues of B in descending order. A translation, rotation, or reflection of the coordinates given by Y will satisfy the same distance matrix up to the limits of machine precision.

For any k <= p, if the largest k positive eigenvalues of B are significantly greater in absolute magnitude than its other eigenvalues, the first k columns of Y provide a k-dimensional reduction of Y which approximates the distances given by D. The optional return e can be used to consider various values of k, or to evaluate the accuracy of specific dimension reductions (e.g., k = 2).

Reference: Ingwer Borg and Patrick J.F. Groenen (2005), Modern Multidimensional Scaling, Second Edition, Springer, ISBN: 978-0-387-25150-9 (Print) 978-0-387-28981-6 (Online)

See also: pdist.

Package: statistics