1

Graph theory and Data mining are two fields of computer science I'm still new at, so excuse my basic understanding.

I have been asked to plot a Dendrogram of a hierarchically clustered graph. The input I have been given is the following : a list of all the edges of this graph.

So far I have been able to draw the graph from the input.

The next step would be clustering the graph, then plotting the Dendrogram from that Clustered graph.

My question is : Can someone give me a step by step guide to follow ? what input/output is required/returned during both steps of the process. (Clustering, getting the Dendrogram)

Note :

So far I have been using graph-tool to draw the graphs, I also ran a test code I found on the internet from the Scipy.clustering.hierarchy package, and it seems to have all the needed functions.

2 Answers 2

3

You are correct the 'Scipy.clustering.hierarchy package' is correct here is some python pseudo code to show you the general idea.

from your statement 'So far I have been able to draw the graph from the input.' I am assuming that you have a good start with getting the input data into python ect ..

start python Clustering pseudo code

I needed at least these python packages

import scipy.cluster.hierarchy  as sch
import numpy as np
import sys

you need a 'distance metric', if your input data was 'strings' then you would use something like this

from Levenshtein import jaro

get the Matrix Dimension from the labels for the distance matrix

distanceMatrixDimension= len(p_names)

get coordinates of the upper triangle
upper_triangle=np.triu_indices(distanceMatrixDimension,1)

get the distances distances=np.apply_along_axis(dis,0,upper_triangle)

start the clustering algorithm from the 'Scipy.clustering.hierarchy' package get the linkage matrix Z here 'average' is the method of the linkage Z=sch.linkage(distances,'average')

get a bound for the metrics dimension values generated from the data max_dist=distances.max()

0.2.max_dist acts like a threshold value, experiment with different values fclster=sch.fcluster(Z,0.2*max_dist,'distance')

end python Clustering pseudo code

Z is a linked hierarchical agglomeration clustering of your data another way to say this is it is a (hierarchical)'tree' with a root node that branches down to your leaf nodes that are generally a record or row in the input data you want to cluster

the Deprogram is just a visualization of this tree. There are a number of ways to do this, you can get plot variables from the dimensions of Z. The best way to do this is with matlab or octave. generally you use dendrogram from scipy to plot the 'dendrogram'

import matplotlib.pyplot as plt

then

dendrogram(Z, color_threshold=1, labels=[....],show_leaf_counts=True)

Sign up to request clarification or add additional context in comments.

2 Comments

threshold value ranges from which 2 values ?
1

Here is the code for dendrogram in python

# Import necessary libraries
import numpy as np
import pandas as pd
from scipy.cluster.hierarchy import linkage, fcluster, dendrogram
import matplotlib.pyplot as plt

# Sample data: Replace this with your actual grain data
# For example, let's create a DataFrame with some sample grain data
data = {
    'Variety': ['Wheat', 'Barley', 'Oats', 'Rye', 'Corn'],
    'Feature1': [1.0, 2.0, 1.5, 2.5, 3.0],
    'Feature2': [2.0, 1.0, 1.5, 2.0, 3.5]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Extract the features and varieties
samples = df[['Feature1', 'Feature2']].values
varieties = df['Variety'].values

# Calculate the linkage: mergings
mergings = linkage(samples, method='complete')

# Plot the dendrogram, using varieties as labels
plt.figure(figsize=(10, 5))
dendrogram(mergings,
           labels=varieties,
           leaf_rotation=90,
           leaf_font_size=10,
)
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('Varieties')
plt.ylabel('Distance')
plt.show()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.