Function File: y = pdist (x)
Function File: y = pdist (x, metric)
Function File: y = pdist (x, metric, metricarg, …)

Return the distance between any two rows in x.

x is the nxd matrix representing q row vectors of size d.

The output is a dissimilarity matrix formatted as a row vector y, (n-1)*n/2 long, where the distances are in the order [(1, 2) (1, 3) … (2, 3) … (n-1, n)]. You can use the squareform function to display the distances between the vectors arranged into an nxn matrix.

metric is an optional argument specifying how the distance is computed. It can be any of the following ones, defaulting to "euclidean", or a user defined function that takes two arguments x and y plus any number of optional arguments, where x is a row vector and and y is a matrix having the same number of columns as x. metric returns a column vector where row i is the distance between x and row i of y. Any additional arguments after the metric are passed as metric (x, y, metricarg1, metricarg2 …).

Predefined distance functions are:

"euclidean"

Euclidean distance (default).

"squaredeuclidean"

Squared Euclidean distance. It omits the square root from the calculation of the Euclidean distance. It does not satisfy the triangle inequality.

"seuclidean"

Standardized Euclidean distance. Each coordinate in the sum of squares is inverse weighted by the sample variance of that coordinate.

"mahalanobis"

Mahalanobis distance: see the function mahalanobis.

"cityblock"

City Block metric, aka Manhattan distance.

"minkowski"

Minkowski metric. Accepts a numeric parameter p: for p=1 this is the same as the cityblock metric, with p=2 (default) it is equal to the euclidean metric.

"cosine"

One minus the cosine of the included angle between rows, seen as vectors.

"correlation"

One minus the sample correlation between points (treated as sequences of values).

"spearman"

One minus the sample Spearman’s rank correlation between observations, treated as sequences of values.

"hamming"

Hamming distance: the quote of the number of coordinates that differ.

"jaccard"

One minus the Jaccard coefficient, the quote of nonzero coordinates that differ.

"chebychev"

Chebychev distance: the maximum coordinate difference.

See also: linkage, mahalanobis, squareform, pdist2.

Package: statistics