# Python – Calculate weighted pairwise distance matrix in Python

matrixnumpypythonscikit-learnscipy

I am trying to find the fastest way to perform the following pairwise distance calculation in Python. I want to use the distances to rank a `list_of_objects` by their similarity.

Each item in the `list_of_objects` is characterised by four measurements a, b, c, d, which are made on very different scales e.g.:

``````object_1 = [0.2, 4.5, 198, 0.003]
object_2 = [0.3, 2.0, 999, 0.001]
object_3 = [0.1, 9.2, 321, 0.023]
list_of_objects = [object_1, object_2, object_3]
``````

The aim is to get a pairwise distance matrix of the objects in `list_of_objects`. However, I want to be able to specify the 'relative importance' of each measurement in my distance calculation via a weights vector with one weight per measurement, e.g.:

``````weights = [1, 1, 1, 1]
``````

would indicate that all measurements are equally weighted. In this case I want each measurement to contribute equally to the distance between objects, regardless of the measurement scale. Alternatively:

``````weights = [1, 1, 1, 10]
``````

would indicate that I want measurement d to contribute 10x more than the other measurements to the distance between objects.

My current algorithm looks like this:

1. Calculate a pairwise distance matrix for each measurement
2. Normalise each distance matrix so that the maximum is 1
3. Multiply each distance matrix by the appropriate weight from `weights`
4. Sum the distance matrices to generate a single pairwise matrix
5. Use the matrix from 4 to provide a ranked list of pairs of objects from `list_of_objects`

This works fine, and gives me a weighted version of the city-block distance between objects.

I have two questions:

1. Without changing the algorithm, what's the fastest implementation in SciPy, NumPy or SciKit-Learn to perform the initial distance matrix calculations.

2. Is there an existing multi-dimensional distance approach that does all of this for me?

For Q 2, I have looked, but couldn't find anything with a built-in step that does the 'relative importance' in the way that I want.

Other suggestions welcome. Happy to clarify if I've missed details.

#### Best Solution

`scipy.spatial.distance` is the module you'll want to have a look at. It has a lot of different norms that can be easily applied.

I'd recommend using the weighted Monkowski Metrik

Weighted Minkowski Metrik

You can do pairwise distance calculation by using the `pdist` method from this package.

E.g.

``````import numpy as np
from scipy.spatial.distance import pdist, wminkowski, squareform

object_1 = [0.2, 4.5, 198, 0.003]
object_2 = [0.3, 2.0, 999, 0.001]
object_3 = [0.1, 9.2, 321, 0.023]
list_of_objects = [object_1, object_2, object_3]

# make a 3x4 array from the list of objects
X = np.array(list_of_objects)

#calculate pairwise distances, using weighted Minkowski norm
distances = pdist(X,wminkowski,2, [1,1,1,10])

#make a square matrix from result
distances_as_2d_matrix = squareform(distances)

print distances
print distances_as_2d_matrix
``````

This will print

``````[ 801.00390786  123.0899671   678.0382942 ]
[[   0.          801.00390786  123.0899671 ]
[ 801.00390786    0.          678.0382942 ]
[ 123.0899671   678.0382942     0.        ]]
``````