3

I have a dictionary which has clusters and each cluster contains different labels

Dictonary look like this
demo_dict = {0: [b'3.0',b'3.0', b'3.0', b'5.0',b'5.0',b'5.0', b'6.0', b'6.0'],
 1: [b'2.0', b'2.0', b'3.0', b'7.0',b'7.0'],
 2: [b'1.0', b'4.0', b'8.0', b'7.0',b'7.0']}

To draw a required plot, am using the following code

comp = demo_dict
df = pd.DataFrame.from_dict(comp, orient='index')
df.index.rename('Clusters', inplace=True)

stacked = df.stack().reset_index()
stacked.rename(columns={'level_1': 'Lable', 0: 'Labels'}, inplace=True)

sns.scatterplot(data=stacked, x='Clusters', y='Labels')
plt.show()

But the thing is, the above code is not drawing all the points, it just mentioned which clusters contains which labels,but i want to have all the points of every cluster on visual.

enter image description here

Is, there something am missing in this code to generate all the points Note: I have also tried with stripplot and swarmplot

1
  • You could try the approaches of How to make jitterplot on matplolib python to add jitter to the points after they have been drawn by sns.scatterplot. It all depends on how many points fall together and what exactly you want to show. It might be needed to use a smaller dot size or applying some alpha. Commented Mar 28, 2021 at 10:52

2 Answers 2

4

With groupby you can group using two columns. The counts can then be displayed via a heatmap:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

demo_dict = {}
for i in range(40):
    demo_dict[i] = np.random.choice([b'1.0', b'2.0', b'3.0', b'4.0', b'5.0', b'6.0', b'7.0', b'8.0'],
                                    np.random.randint(10, 30))
df = pd.DataFrame.from_dict(demo_dict, orient='index')
df.index.rename('Clusters', inplace=True)

stacked = df.stack().reset_index()
stacked.rename(columns={'level_1': 'Lable', 0: 'Labels'}, inplace=True)

grouped = stacked.groupby(['Labels', 'Clusters']).agg('count').unstack()

fig = plt.figure(figsize=(15, 4))
ax = sns.heatmap(data=grouped, annot=True, cmap='rocket_r', cbar_kws={'pad': 0.01})
ax.set_xlabel('')
ax.tick_params(axis='y', labelrotation=0)
plt.tight_layout()
plt.show()

heatmap of groupby counts

An alternative is to show the counts as sizes in a scatterplot

grouped = stacked.groupby(['Labels', 'Clusters']).agg('count').reset_index()
fig = plt.figure(figsize=(15, 4))
ax = sns.scatterplot(data=grouped, x='Clusters', y='Labels', size='Lable', color='orchid')
for h in ax.legend_.legendHandles:
    h.set_color('orchid')  # the default color in the sizes legends is black
ax.margins(x=0.01) # less whitespace
# set the legend outside
ax.legend(handles=ax.legend_.legendHandles, title='Counts:', bbox_to_anchor=(1.01, 1.02), loc='upper left')

scatterplot with sizes

You could also try the approach from How to make jitterplot on matplolib python, optionally using different jitter offsets in x and y direction. With your data it could look as follows:

def jitter_dots(dots):
    offsets = dots.get_offsets()
    jittered_offsets = offsets
    jittered_offsets[:, 0] += np.random.uniform(-0.3, 0.3, offsets.shape[0]) # x
    jittered_offsets[:, 1] += np.random.uniform(-0.3, 0.3, offsets.shape[0]) # y
    dots.set_offsets(jittered_offsets)

ax = sns.scatterplot(data=stacked, x='Clusters', y='Labels')
jitter_dots(ax.collections[0])

scatterplot with jitter

Here is how it could look like with 8 different colors, alternating per cluster:

ax = sns.scatterplot(data=stacked, x='Clusters', y='Labels',
                     hue=stacked['Clusters'] % 8, palette='Dark2', legend=False)
jitter_dots(ax.collections[0])
ax.margins(x=0.02)
sns.despine()

scatterplot with colors per column

Sign up to request clarification or add additional context in comments.

4 Comments

i appreacite your answer. With regards to your first answer, am not looking for histogram. And with respect to the second one, i dont have to show counts explicitly. I need to draw a simple in scatter plot which just describe each clusters with dots equals to the number of amount labels it has.
it seems to be working as what i want , could you also tell me, how we can change the color of dots for each cluster
Yes, I have added "hue='Clusters' " and now it gives me exactly what i want. thanks man
Nice trick at the end with the modulo hues.
3

If I understand correctly, you can use a swarmplot (or the similar stripplot):

sns.swarmplot(data=stacked, x='Clusters', y='Labels')

swarm plot

1 Comment

here i have posted the demo dict. But in real i have dictionary where each cluster cotains at least 40 labels. I have tried to use swarmplot or stripplot there already. But still it didnot show all points

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.