4

I am trying to create a clustered heatmap (with a dendrogram) using plotly in Python. The one they have made in their website does not scale well, I have come to various solutions, but most of them are in R or JavaScript. I am trying to create a heatmap with a dendrogram from the left side of the heatmap only, showing clusters across the y axis (from the hierarchical clustering). A really good looking example is this one: https://chart-studio.plotly.com/~jackp/6748. My purpose is to create something like this, but only with the left-side dendrogram. If someone can implement something like this in Python, I will be really grateful!

Let the data be X = np.random.randint(0, 10, size=(120, 10))

1
  • 2
    It may seem like it sometimes, but StackOverflow is not a code-ordering service. Rather, you present what you've tried and the rest of us help out with whatever challenges you experience. Though since you're fairly new in the game and we're all very nice in here I'll see if I can work something together to at least get you started. Commented Mar 9, 2021 at 21:40

3 Answers 3

4

The following suggestion draws on elements from both Dendrograms in Python and chart-studio.plotly.com/~jackp. This particular plot uses your data X = np.random.randint(0, 10, size=(120, 10)). One thing that the linked approaches had in common, was, in my opinion, that the datasets and data munging procedures were a bit messy. So I decided to build the following figure on a pandas dataframe with df = pd.DataFrame(X) to hopefully make everything a bit clearer

Plot

enter image description here

Complete code

import plotly.graph_objects as go
import plotly.figure_factory as ff

import numpy as np
import pandas as pd
from scipy.spatial.distance import pdist, squareform
import random
import string

X = np.random.randint(0, 10, size=(120, 10))
df = pd.DataFrame(X)

# Initialize figure by creating upper dendrogram
fig = ff.create_dendrogram(df.values, orientation='bottom')
fig.for_each_trace(lambda trace: trace.update(visible=False))

for i in range(len(fig['data'])):
    fig['data'][i]['yaxis'] = 'y2'

# Create Side Dendrogram
# dendro_side = ff.create_dendrogram(X, orientation='right', labels = labels)
dendro_side = ff.create_dendrogram(X, orientation='right')
for i in range(len(dendro_side['data'])):
    dendro_side['data'][i]['xaxis'] = 'x2'

# Add Side Dendrogram Data to Figure
for data in dendro_side['data']:
    fig.add_trace(data)

# Create Heatmap
dendro_leaves = dendro_side['layout']['yaxis']['ticktext']
dendro_leaves = list(map(int, dendro_leaves))
data_dist = pdist(df.values)
heat_data = squareform(data_dist)
heat_data = heat_data[dendro_leaves,:]
heat_data = heat_data[:,dendro_leaves]

heatmap = [
    go.Heatmap(
        x = dendro_leaves,
        y = dendro_leaves,
        z = heat_data,
        colorscale = 'Blues'
    )
]

heatmap[0]['x'] = fig['layout']['xaxis']['tickvals']
heatmap[0]['y'] = dendro_side['layout']['yaxis']['tickvals']

# Add Heatmap Data to Figure
for data in heatmap:
    fig.add_trace(data)

# Edit Layout
fig.update_layout({'width':800, 'height':800,
                         'showlegend':False, 'hovermode': 'closest',
                         })
# Edit xaxis
fig.update_layout(xaxis={'domain': [.15, 1],
                                  'mirror': False,
                                  'showgrid': False,
                                  'showline': False,
                                  'zeroline': False,
                                  'ticks':""})
# Edit xaxis2
fig.update_layout(xaxis2={'domain': [0, .15],
                                   'mirror': False,
                                   'showgrid': False,
                                   'showline': False,
                                   'zeroline': False,
                                   'showticklabels': False,
                                   'ticks':""})

# Edit yaxis
fig.update_layout(yaxis={'domain': [0, 1],
                                  'mirror': False,
                                  'showgrid': False,
                                  'showline': False,
                                  'zeroline': False,
                                  'showticklabels': False,
                                  'ticks': ""
                        })
# # Edit yaxis2
fig.update_layout(yaxis2={'domain':[.825, .975],
                                   'mirror': False,
                                   'showgrid': False,
                                   'showline': False,
                                   'zeroline': False,
                                   'showticklabels': False,
                                   'ticks':""})

fig.update_layout(paper_bgcolor="rgba(0,0,0,0)",
                  plot_bgcolor="rgba(0,0,0,0)",
                  xaxis_tickfont = dict(color = 'rgba(0,0,0,0)'))

fig.show()
Sign up to request clarification or add additional context in comments.

5 Comments

Very extensive and clear answer, thank you! By the way, the pape_bgcolor changes the color of the option menu. Do you know how can I preserve the old color (which was basically no color) ? Because if I remove the paper_bgcolor, the graph changes a bit.
@Petar Thank you for accepting my answer. That color thing is really weird though. I thought it would be fixed by setting plot_bgcolor='#E5ECF6'. But no... I'll have to check this in a bit more detail when I find the time.
Alright. This is really good answer, unlike many stuff that I found on the internet. Thank you for spending time to do it! Regarding the color thing, I will also look it tomorrow, it must be something small to be fixed! Good night! :)
might be a dumb comment, but this seems to be plotting the distance matrix, and not the original data ordered by dendrogram. Is that the case?
For some reason, I get some sort of intermediate steps in plotting, like 10s of partial dendrograms, until the final one appears. Any thought why that is?
1
  1. The simplest solution to this problem is to use dash_bio.Clustergram function in dash_bio package.
import pandas as pd
import dash_bio as dashbio

X = np.random.randint(0, 10, size=(120, 10))

dashbio.Clustergram(
    data=X,
    # row_labels=rows,
    # column_labels=columns,
    cluster='row',
    color_threshold={
        'row': 250,
        'col': 700
    },
    height=800,
    width=700,
    color_map= [
        [0.0, '#636EFA'],
        [0.25, '#AB63FA'],
        [0.5, '#FFFFFF'],
        [0.75, '#E763FA'],
        [1.0, '#EF553B']
    ]
)

enter image description here

  1. An more laborious solution is to use the plot function plotly.figure_factory.create_dendrogram combined with plotly.graph_objects.Heatmap as in plotly document the example is not a dendrogram heat map but rather a pair wised distance heat map, you can use the two function to create dendrogram heat map though.

Comments

0

can also use seabornes clustermap https://seaborn.pydata.org/generated/seaborn.clustermap.html

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.