Clustering in python(scipy) with space and time variables

The format of my dataset: [x-coordinate, y-coordinate, hour] with hour an integer value from 0 to 23.

My question now is how can I cluster this data when I need an euclidean distance metric for the coordinates, but a different one for the hours (since d(23,0) is 23 in the euclidean distance metric). Is it possible to cluster data with different distance metrics for each feature in scipy? How?

Thank you


You'll need to define your own metric, which handles "time" in an appropriate way. In the docs for scipy.spatial.distance.pdist you can define your own function

Y = pdist(X, f)

Computes the distance between all pairs of vectors in X using the user supplied 2-arity function f. [...] For example, Euclidean distance between the vectors could be computed as follows:

dm = pdist(X, lambda u, v: np.sqrt(((u-v)**2).sum()))

The metric can be passed to any scipy clustering algorithm, via the metric keyword. For example, using linkage :

scipy.cluster.hierarchy.linkage(y, method='single', metric='euclidean')
链接地址: http://www.djcxy.com/p/15862.html

上一篇: 当用户注销时,进程不会等待完成

下一篇: 在python(scipy)中用空间和时间变量进行聚类