7
$\begingroup$

I am following the example code in the linkage documentation:

from scipy.cluster.hierarchy import dendrogram, linkage
from matplotlib import pyplot as plt
Z = linkage(X, 'single')
fig = plt.figure(figsize=(25, 10))
dn = dendrogram(Z)
plt.show()

enter image description here

The linkage matrix Z is:

array([[ 2,  7,  0,  2],  # This becomes object #8
       [ 5,  6,  0,  2],  # This becomes object #9
       [ 0,  4,  1,  2],  # This becomes object #10
       [ 8, 10,  1,  4],  # Merge #8 and #10
       [ 1,  9,  1,  3],
       [ 3, 11,  2,  5],
       [12, 13,  4,  8]])
  • The merging of #2 and #7 creates #8
  • The merging of #0 and #4 creates #10
  • #8 clearly merges with #10 after the merging of #0 and #4
  • In contrast, the dendrogram shows the merging of #0, #4, and #8 in one merge operation

I'm new to clustering. Am I simply misunderstanding how linkages and dendograms work, or is this unintended behaviour?

$\endgroup$

1 Answer 1

8
$\begingroup$

The distances in your example are discrete, so you end up with multiple merges occurring simultaneously.

#8 clearly merges with #10 after the merging of #0 and #4

The third column of $Z$ is the distance at which the merge takes place - note that it's 1 for a number of rows, meaning multiple merges occur at the same time for $distance=1$. $Z$ is listed in order of increasing distance, but when multiple rows share the same distance, they are happening at the same time rather than sequentially.

We start off considering the closest pairs, meaning a distance of 0. There are two merges that meet this criterion, so they happen at the same time

  • Two merges at a single-linkage distance of 0:
    • $2$ and $7$ merge, creating cluster $8=\{2,7\}$
    • $5$ and $6$ merge, creating cluster $9=\{5,6\}$

The next-closest pairs all share the same distance of 1, so they happen at the same point on the dendrogram:

  • Three merges at a single-linkage distance of 1:
    • $0$ and $4$ merge, creating cluster $10=\{0,4\}$
    • clusters $8$ and $10$ merge, creating cluster $11=\{2,7,0,4\}$
    • $1$ and cluster $9$ merge, ...

That's why $0$ and $4$ come together, and at the same time join $8=\{2,7\}$

$\endgroup$
1
  • 1
    $\begingroup$ Thank you, Muhammed. I was under the mistaken impression that mergers were always sequential and pairwise, but your explanation of the fact that mergers between cluster pairs with identical distances are considered common makes a lot of sense. $\endgroup$ Commented Jul 15 at 20:45

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.