解析hierarchical.py内部函数
From Agglomerative clustering
,
self.children_, self.n_components_, self.n_leaves_, parents = \
memory.cache(tree_builder)(X, connectivity,
n_clusters=n_clusters, **kwargs)
memory.cache
input X,connectivity
to tree_builder
(line 739), tree_builder
is initialized by self.linkage
(line 712)
tree_builder = _TREE_BUILDERS[self.linkage]
Here, linkage='ward'
by default, go back to ward_tree
(line 86)
The connectivity
parameter is a $n\times n$ can restrict the clustering, those have no connectivity can not be clustered together
connectivity, n_components = _fix_connectivity(X, connectivity, affinity='euclidean')
# generalized connectivity, prevent it from empty, all are True by default
then, create nodes:
if n_clusters is None:
n_nodes = 2 * n_samples - 1
# binary tree has 2*n-1 nodes totally
else:
n_nodes = 2 * n_samples - n_clusters
# stop when there are enough clusters, if never stop then it will go to 1 cluster finally
build a heap for inertia
, then pop one by one to build the cluster tree
heapify(inertia)
in the loop (begin from line 239)
for k in range(n_samples, n_nodes):
# identify the merge
while True:
inert, i, j = heappop(inertia)
if used_node[i] and used_node[j]:
break
parent[i], parent[j] = k, k
children.append((i, j))
# merge i and j, stored in children
after merge, put the new node in the heap (line 274)
[heappush(inertia, (ini[idx], k, coord_col[idx])) for idx in range(n_additions)]
then one iteration ends
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。