5

I am looking for a way to descriptively scatter a pandas.DataFrame similar to this:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   type    1000 non-null   object
 1   value   1000 non-null   int64
 2   count   1000 non-null   int64
dtypes: int64(2), object(1)
memory usage: 23.6+ KB

Using pandas.DataFrame.plot or seaborn.scatterplot, the points for each type are all placed on one vertical line overlapping each other. To mitigate this issue I want to introduce at least some jitter in the x-direction but I don't know how.

My plots so far:

import pandas as pd
import matplotlib.pyplot as plt
import random

df = pd.DataFrame({
    'type': [random.choice(['t1', 't2', 't3']) for _ in range(1000)],
    'value': [random.randint(0, 500) for _ in range(1000)],
    'count': [random.randint(0,250) for _ in range(1000)],
    })

df.plot(kind='scatter', x='type', y='value', c='count', cmap='Blues')
plt.show()

scatterplot using pandas

import seaborn as sns

sns.scatterplot(x='type', y='value', data=df, hue='count')
plt.show()

scatterplot seaborn

2 Answers 2

5

I managed to jitter the types by encoding the types with numeric values and then jitter them instead. However, this requires at least 1 more column in the DataFrame.

import pandas as pd
import matplotlib.pyplot as plt
import random

df = pd.DataFrame({
    'type': [random.choice(['t1', 't2', 't3']) for _ in range(1000)],
    'value': [random.randint(0, 500) for _ in range(1000)],
    'count': [random.randint(0,250) for _ in range(1000)],
    })

def jitter(x):
    return x + random.uniform(0, .5) -.25

type_ids = {'t1': 1, 't2': 2, 't3': 3}

df['type_id'] = df['type'].apply(lambda x: type_ids[x])
df['jitter_type'] = df['type_id'].apply(lambda x: jitter(x))

df.plot(kind='scatter', x='jitter_type', y='value', c='count', cmap='Blues')
plt.xticks([1,2,3])
plt.gca().set_xticklabels(['t1', 't2', 't3'])
plt.show()

jittered scatterplot

Sign up to request clarification or add additional context in comments.

Comments

4

The problem with your approach is that seaborn's scatterplot lacks specific functionality that makes sense in the context of categorical data, e.g., jitter. Hence, seaborn provides "scatterplots for caterogical data": stripplot or swarmplot. But seaborn creates an ... interesting figure legend. We have to get rid of this and replace it with a colorbar:

#fake data generation
import pandas as pd
import numpy as np

np.random.seed(123)
ndf = 1000
df = pd.DataFrame({
    'Type': [np.random.choice(['t1', 't2', 't3']) for _ in range(ndf)],
    'Val': [np.random.randint(0, 700) for _ in range(ndf)],
    'Cou': [np.random.randint(0, 500) for _ in range(ndf)],
    })
    
#now the actual plotting  
import seaborn as sns
from matplotlib import colors, cm
import matplotlib.pyplot as plt
    
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))

#preparation for the colorbars
pal = "coolwarm"
normpal = colors.Normalize(df.Cou.min(), df.Cou.max())

#stripplot display
sns.stripplot(x="Type", y="Val", data=df, hue="Cou", palette=pal, ax=ax1, jitter=0.2)
ax1.get_legend().remove()
ax1.set_title("stripplot")
fig.colorbar(cm.ScalarMappable(cmap=pal, norm=normpal), ax=ax1)

#swarmplot display
sns.swarmplot(x="Type", y="Val", data=df, hue="Cou", palette=pal, ax=ax2)
ax2.get_legend().remove()
ax2.set_title("swarmplot")
fig.colorbar(cm.ScalarMappable(cmap=pal, norm=normpal), ax=ax2)

plt.tight_layout()
plt.show()

Sample output: enter image description here

3 Comments

> The problem with your approach is that by definition seaborn's scatterplot is for numerical data. I wouldn't say this. The categorical plotting functions in seaborn explicitly treat all data as categorical, but the inverse is not true; scatterplot (by virtue of the categorical support in matplotlib) handles categorical variables perfectly fine. But it does currently lack some features (like jitter) that make sense only or primarily in the context of categorical data.
Not much to add. True dat. Edited the description based on your input.
Exactly what I was looking for. With the help of this answer I was able to label the colorbar (matplotlib.axes.Axes.set_ylabel).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.