## Documentation Center |

Pairwise distance between pairs of objects

`D = pdist(X)D = pdist(X,distance)`

`D = pdist(X)` computes the
Euclidean distance between pairs of objects in *m*-by-*n* data
matrix `X`. Rows of `X` correspond
to observations, and columns correspond to variables. `D` is
a row vector of length *m*(*m*–1)/2,
corresponding to pairs of observations in `X`. The
distances are arranged in the order (2,1), (3,1), ..., (*m*,1),
(3,2), ..., (*m*,2), ..., (*m*,*m*–1)). `D` is
commonly used as a dissimilarity matrix in clustering or multidimensional
scaling.

To save space and computation time, `D` is
formatted as a vector. However, you can convert this vector into a
square matrix using the `squareform` function
so that element *i*, *j* in the
matrix, where *i* < *j*, corresponds
to the distance between objects *i* and *j* in
the original data set.

`D = pdist(X,distance)` computes
the distance between objects in the data matrix,

Metric | Description |
---|---|

'euclidean' | Euclidean distance (default). |

'seuclidean' | Standardized Euclidean distance. Each coordinate difference
between rows in X is scaled by dividing by the corresponding element
of the standard deviation S, use D=pdist(X,'seuclidean',S). |

'cityblock' | City block metric. |

'minkowski' | Minkowski distance. The default exponent is 2. To specify
a different exponent, use |

'chebychev' | Chebychev distance (maximum coordinate difference). |

'mahalanobis' | Mahalanobis distance, using the sample covariance of |

'cosine' | One minus the cosine of the included angle between points (treated as vectors). |

'correlation' | One minus the sample correlation between points (treated as sequences of values). |

'spearman' | One minus the sample Spearman's rank correlation between observations (treated as sequences of values). |

'hamming' | Hamming distance, which is the percentage of coordinates that differ. |

'jaccard' | One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ. |

custom distance function | A distance function specified using @: A distance function must be of form d2 = distfun(XI,XJ) taking
as arguments a 1-by- |

The output `D` is arranged in the order of ((2,1),(3,1),...,
(*m*,1),(3,2),...(*m*,2),.....(*m*,*m*–1)),
i.e. the lower left triangle of the full *m*-by-*m* distance
matrix in column order. To get the distance between the *i*th
and *j*th observations (*i* < *j*),
either use the formula *D*((*i*–1)*(*m*–*i*/2)+*j*–*i*),
or use the helper function `Z = squareform(D)`, which
returns an *m*-by-*m* square symmetric
matrix, with the (*i*,*j*) entry
equal to distance between observation *i* and observation *j*.

Given an *m*-by-*n* data matrix `X`,
which is treated as *m* (1-by-*n*)
row vectors `x`_{1}, `x`_{2},
..., `x`_{m},
the various distances between the vector `x`_{s} and `x`_{t} are
defined as follows:

Euclidean distance

Notice that the Euclidean distance is a special case of the Minkowski metric, where

`p`= 2.Standardized Euclidean distance

where

`V`is the*n*-by-*n*diagonal matrix whose*j*th diagonal element is`S`(*j*)^{2}, where`S`is the vector of standard deviations.Mahalanobis distance

where

`C`is the covariance matrix.City block metric

Notice that the city block distance is a special case of the Minkowski metric, where

`p=`1.Minkowski metric

Notice that for the special case of

`p`= 1, the Minkowski metric gives the city block metric, for the special case of`p`= 2, the Minkowski metric gives the Euclidean distance, and for the special case of`p`= ∞, the Minkowski metric gives the Chebychev distance.Chebychev distance

Notice that the Chebychev distance is a special case of the Minkowski metric, where

`p`= ∞.Cosine distance

Correlation distance

where

and

Hamming distance

Jaccard distance

Spearman distance

where

*r*is the rank of_{sj}*x*taken over_{sj}*x*,_{1j}*x*, ..._{2j}*x*, as computed by_{mj}`tiedrank`*r*and_{s}*r*are the coordinate-wise rank vectors of_{t}*x*and_{s}*x*, i.e.,_{t}*r*= (_{s}*r*,_{s1}*r*, ..._{s2}*r*)_{sn}

Generate random data and find the unweighted Euclidean distance and then find the weighted distance using two different methods:

% Compute the ordinary Euclidean distance. X = randn(100, 5); D = pdist(X,'euclidean'); % euclidean distance % Compute the Euclidean distance with each coordinate % difference scaled by the standard deviation. Dstd = pdist(X,'seuclidean'); % Use a function handle to compute a distance that weights % each coordinate contribution differently. Wgts = [.1 .3 .3 .2 .1]; % coordinate weights weuc = @(XI,XJ,W)(sqrt(bsxfun(@minus,XI,XJ).^2 * W')); Dwgt = pdist(X, @(Xi,Xj) weuc(Xi,Xj,Wgts));

`cluster` | `clusterdata` | `cmdscale` | `cophenet` | `dendrogram` | `inconsistent` | `linkage` | `pdist2` | `silhouette` | `squareform`

Was this topic helpful?