解析hierarchical.py内部函数

From Agglomerative clustering,

self.children_, self.n_components_, self.n_leaves_, parents = \
        memory.cache(tree_builder)(X, connectivity,
                                   n_clusters=n_clusters, **kwargs)

memory.cache input X,connectivity to tree_builder (line 739), tree_builder is initialized by self.linkage (line 712)

tree_builder = _TREE_BUILDERS[self.linkage]

Here, linkage='ward' by default, go back to ward_tree (line 86)

The connectivity parameter is a $n\times n$ can restrict the clustering, those have no connectivity can not be clustered together

connectivity, n_components = _fix_connectivity(X, connectivity, affinity='euclidean')
# generalized connectivity, prevent it from empty, all are True by default

then, create nodes:

if n_clusters is None:
    n_nodes = 2 * n_samples - 1
# binary tree has 2*n-1 nodes totally
else:
    n_nodes = 2 * n_samples - n_clusters
# stop when there are enough clusters, if never stop then it will go to 1 cluster finally

build a heap for inertia, then pop one by one to build the cluster tree

heapify(inertia)

in the loop (begin from line 239)

for k in range(n_samples, n_nodes):
    # identify the merge
    while True:
        inert, i, j = heappop(inertia)
        if used_node[i] and used_node[j]:
            break
    parent[i], parent[j] = k, k
    children.append((i, j))
    # merge i and j, stored in children

after merge, put the new node in the heap (line 274)

[heappush(inertia, (ini[idx], k, coord_col[idx])) for idx in range(n_additions)]

then one iteration ends


Lycheeee
0 声望1 粉丝

引用和评论

0 条评论