Plot categorical scatterplot in seaborn or matplotlib

Question

I have the following dataframe

   it, A   B   C   D
0  10, aa  mn  cd  kk
1  100, ab  cd  wc  ll
2  1000, wc  cd  mn  sf
3  10000, ll  ll  kk  mn
4  100000, wc  kk  mn  cd
5  1000000, aa  ll  we  sf
6  10000000, ss  aa  ss  kk

created as

options = ["ab", "cd", "bb", "aa", "we", "ss", "kk", "mn", "re", "wc", "ll", "sf"]
df = pd.DataFrame(columns=["A", "B", "C", "D"])
for i, it in enumerate([1,2,3,4,5,6,7]):
    row = [10**i, random.sample(options, 1)[0], random.sample(options, 1)[0], 
           random.sample(options, 1)[0], random.sample(options, 1)[0]]
    df.loc[i] = row

The goal is to create a scatterplot where y axis are unique values from a dataframe in sorted order e.g options and a-axis corresponds to column it. Now depending on whether data belongs to column A, B, C, or D I want to color scatter-dots differently and specify a legend. So I know what class a dot comes from.

How do I do it in seaborn or matplotlib?

The way I am doing it in matplotlib is

iters = list(range(df.shape[0]))
x, y = sort(iters, df["A"])
plt.scatter(x, y, color="red")
x, y = sort(iters, df["B"])
plt.scatter(x, y, color="blue")
...

but that does not sort the entire y-axis, only labels that belong to separate columns.

Quang Hoang · Accepted Answer · 2020-11-15 06:43:38Z

1

Let's try stack the data, convert to categorical with given order, sort and plot:

s = df.stack() 

s = pd.Series(pd.Categorical(s, categories=options, ordered=True),
              index=s.index)

sns.scatterplot(data=s.sort_values().reset_index(name='value'),
                x='level_0', y='value', hue='level_1'
               )

Output:

Update: if you have a column xvalue and only care for some columns ['A','B','C','D'], use melt instead of stack:

s = df.melt(id_vars='xvalue', 
            value_vars=['A','B','C','D'],
            value_name='value',
            var_name='column')
s['value'] = pd.Categorical(s['value'], categories=options, ordered=True)

sns.scatterplot(data=s.sort_values('value'),
                x='xvalue', y='value', hue='column'
               )

edited Nov 15, 2020 at 6:43

answered Nov 15, 2020 at 3:31

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

YohanRoth Over a year ago

But that does not look sorted! x-axis is supposed to be y-axis, but thats not a big issue. The problem is that x-axis is not sorted, at least for matplotlib one

Quang Hoang Over a year ago

@YohanRoth I missed the options part. See updated answer.

YohanRoth Over a year ago

so your solution does not work in my case, it's my fault, I did not specify the problem correctly. in addition to A, B, C, D column I have another column that specifies x-axis values that are not just row iters (they have large gaps like 1, 10, 1000, 10000) etc. Could you show me how to modify the answer to accomodate this? Anyways I will accept it! I updated the problem

YohanRoth Over a year ago

I get ValueError: Length of values (3) does not match length of index (28) for s['value'] = pd.Categorical(s, categories=options, ordered=True)

YohanRoth Over a year ago

s.shape (28, 3) pd.Categorical (3,)

|

Collectives™ on Stack Overflow

Plot categorical scatterplot in seaborn or matplotlib

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related