1

I am performing Hierarchical Clustering with python.

    from scipy.cluster.hierarchy import dendrogram, linkage
    from matplotlib import pyplot as plt

    linked = linkage(dataset, 'complete')

    labelList = list(range(len(dataset)))
    
    fig = plt.figure(figsize=(10, 7))
    fig.patch.set_facecolor('white')

    dendrogram(linked,
                orientation='top',
                labels=labelList,
                distance_sort='descending',
                show_leaf_counts=True)
    plt.show()

Here is the dendrogram I get.

hca dendrogram

There are two classes. I am now trying to get the indices of each class, while giving n_clusters=2 in the function AgglomerativeClustering.

from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(n_clusters=2, affinity='euclidean', linkage='ward')  
output = cluster.fit_predict(dataset)

output

array([0, 0, 0, 0, 0, 0, 1, 0, 1, 1])

These two classes are different from those in the dendrogram. I a currently manually notice the indices of the classes from the dendrogram.
Is there a way to do that automatically? Why does the function AgglomerativeClustering yield different results than the dendrogram?

EDIT: There must be a matching between the two functions dendrogram and AgglomerativeClustering.

3
  • Welcome to Stack Overflow! What is your question? Commented Sep 15, 2021 at 10:39
  • I am also looking for answer to this question. Its a very good question. Commented Jul 10, 2022 at 5:21
  • Sorry - my bad - it matched - my indices were wrong Commented Jul 10, 2022 at 7:04

1 Answer 1

1

You find the tallest line that is uncut by the horizontal line. I don't see your horizontal line here, but you have two clusters, represented as orange and green. Read the link below for more info.

https://ml2021.medium.com/clustering-with-python-hierarchical-clustering-a60688396945

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you for your answer. I understand the article in the link you provided. I also see two clusters. But the AgglomerativeClustering function does not output the two clusters we see on the dendrogram. Why is that the case ?
You choose the number of clusters when you use Agglomerative Clustering. It's like this: model=AgglomerativeClustering(n_clusters=2) Or, make it whatever you want.
Affinity Propagation does not require the number of clusters to be determined or estimated before running the algorithm, for this purpose the two important parameters are the preference, which controls how many exemplars (or prototypes) are used, and the damping factor which damps the responsibility and availability of messages to avoid numerical oscillations when updating these messages. Maybe this is what you're after?
So - how do we use AgglomerativeClustering (parameters) so that we get the same leafs (as colored) in a dendogram ? I think thats his question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.