Classical multidimensional scaling of a matrix.
Takes an n by n distance (or difference, similarity, or
dissimilarity) matrix D. Returns Y, a matrix of n points
with coordinates in p dimensional space which approximate those
distances (or differences, similarities, or dissimilarities). Also returns
the eigenvalues e of
B = -1/2 * J * (D.^2) * J
, where
J = eye(n) - ones(n,n)/n
. p, the number
of columns of Y, is equal to the number of positive real eigenvalues of
B.
D can be a full or sparse matrix or a vector of length
n*(n-1)/2
containing the upper triangular elements (like
the output of the pdist
function). It must be symmetric with
non-negative entries whose values are further restricted by the type of
matrix being represented:
* If D is either a distance, dissimilarity, or difference matrix, then it must have zero entries along the main diagonal. In this case the points Y equal or approximate the distances given by D.
* If D is a similarity matrix, the elements must all be less than or
equal to one, with ones along the the main diagonal. In this case the points
Y equal or approximate the distances given by
D = sqrt(ones(n,n)-D)
.
D is a Euclidean matrix if and only if B is positive
semi-definite. When this is the case, then Y is an exact representation
of the distances given in D. If D is non-Euclidean, Y only
approximates the distance given in D. The approximation used by
cmdscale
minimizes the statistical loss function known as
strain.
The returned Y is an n by p matrix showing possible
coordinates of the points in p dimensional space
(p < n
). The columns are correspond to the positive
eigenvalues of B in descending order. A translation, rotation, or
reflection of the coordinates given by Y will satisfy the same distance
matrix up to the limits of machine precision.
For any k <= p
, if the largest k positive
eigenvalues of B are significantly greater in absolute magnitude than
its other eigenvalues, the first k columns of Y provide a
k-dimensional reduction of Y which approximates the distances
given by D. The optional return e can be used to consider various
values of k, or to evaluate the accuracy of specific dimension
reductions (e.g., k = 2
).
Reference: Ingwer Borg and Patrick J.F. Groenen (2005), Modern Multidimensional Scaling, Second Edition, Springer, ISBN: 978-0-387-25150-9 (Print) 978-0-387-28981-6 (Online)
See also: pdist.
Package: statistics