新手上路，请多包涵

我是 Python 的新手，我需要实现一个聚类算法。为此，我需要计算给定输入数据之间的距离。

考虑以下输入数据 -

     [[1,2,8],
     [7,4,2],
     [9,1,7],
     [0,1,5],
     [6,4,3]]

我想要在这里实现的是，我想计算 [1,2,8] 与所有其他点的距离，并找到距离最小的点。

我必须对所有其他点重复这一点。

我试图用 FOR 循环来实现它，但我确信 SciPy/NumPy 一定有一个函数可以帮助我有效地实现这个结果。

我在网上查看，但“pdist”命令无法完成我的工作。

有人可以指导我吗？

TIA

原文由 Adhish Thite 发布，翻译遵循 CC BY-SA 4.0 许可协议

python python-2.7 python-3.x numpy array-broadcasting

阅读 1.4k

2 个回答

得票最新

社区维基

发布于
2022-11-17

✓ 已被采纳

使用 np.linalg.norm 结合广播（ _numpy 外减法_），你可以这样做：

 np.linalg.norm(a - a[:,None], axis=-1)

a[:,None] 插入一个新轴 a ， a - a[:,None] 将由于广播而逐行减法。 np.linalg.norm np.sqrt(np.sum(np.square(...))) ：

 a = np.array([[1,2,8],
     [7,4,2],
     [9,1,7],
     [0,1,5],
     [6,4,3]])

np.linalg.norm(a - a[:,None], axis=-1)
#array([[ 0.        ,  8.71779789,  8.1240384 ,  3.31662479,  7.34846923],
#       [ 8.71779789,  0.        ,  6.164414  ,  8.18535277,  1.41421356],
#       [ 8.1240384 ,  6.164414  ,  0.        ,  9.21954446,  5.83095189],
#       [ 3.31662479,  8.18535277,  9.21954446,  0.        ,  7.        ],
#       [ 7.34846923,  1.41421356,  5.83095189,  7.        ,  0.        ]])

元素 [0,1] , [0,2] 例如对应于：

 np.sqrt(np.sum((a[0] - a[1]) ** 2))
# 8.717797887081348

np.sqrt(np.sum((a[0] - a[2]) ** 2))
# 8.1240384046359608

分别。

原文由 Psidom 发布，翻译遵循 CC BY-SA 3.0 许可协议

社区维基

发布于
2022-11-17

这是使用 SciPy's cdist 的一种方法—

 from scipy.spatial.distance import cdist
def closest_rows(a):
    # Get euclidean distances as 2D array
    dists = cdist(a, a, 'sqeuclidean')

    # Fill diagonals with something greater than all elements as we intend
    # to get argmin indices later on and then index into input array with those
    # indices to get the closest rows
    dists.ravel()[::dists.shape[1]+1] = dists.max()+1
    return a[dists.argmin(1)]

样品运行 -

 In [72]: a
Out[72]:
array([[1, 2, 8],
       [7, 4, 2],
       [9, 1, 7],
       [0, 1, 5],
       [6, 4, 3]])

In [73]: closest_rows(a)
Out[73]:
array([[0, 1, 5],
       [6, 4, 3],
       [6, 4, 3],
       [1, 2, 8],
       [7, 4, 2]])

运行时测试

其他工作方法 -

 def norm_app(a): # @Psidom's soln
    dist = np.linalg.norm(a - a[:,None], axis=-1);
    dist[np.arange(dist.shape[0]), np.arange(dist.shape[0])] = np.nan
    return a[np.nanargmin(dist, axis=0)]

计时 10,000 点 -

 In [79]: a = np.random.randint(0,9,(10000,3))

In [80]: %timeit norm_app(a) # @Psidom's soln
1 loop, best of 3: 3.83 s per loop

In [81]: %timeit closest_rows(a)
1 loop, best of 3: 392 ms per loop

进一步提升性能

eucl_dist 包（免责声明：我是它的作者）包含各种计算欧氏距离的方法，这些方法比 SciPy's cdist 更有效，尤其是对于大型阵列。

因此，利用它，我们会有一个性能更高的，就像这样 -

 from eucl_dist.cpu_dist import dist
def closest_rows_v2(a):
    dists = dist(a,a, matmul="gemm", method="ext")
    dists.ravel()[::dists.shape[1]+1] = dists.max()+1
    return a[dists.argmin(1)]

时间 -

 In [162]: a = np.random.randint(0,9,(10000,3))

In [163]: %timeit closest_rows(a)
1 loop, best of 3: 394 ms per loop

In [164]: %timeit closest_rows_v2(a)
1 loop, best of 3: 229 ms per loop

原文由 Divakar 发布，翻译遵循 CC BY-SA 3.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

计算矩阵中一个点与所有其他点之间的距离

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

如何使用 python 代码实现迅雷磁力链接资源的下载？

如何实现一个深拷贝函数？

请问，FastAPI如何获取到前端上传的二进制文件并且返回？

浏览器能请求到数据怎么换了api工具或是爬虫都没数据了呢？

Python 成员变量在多个子类实例间共享，如何避免？

Stack Overflow 翻译