1

I have the following dataframe

   it, A   B   C   D
0  10, aa  mn  cd  kk
1  100, ab  cd  wc  ll
2  1000, wc  cd  mn  sf
3  10000, ll  ll  kk  mn
4  100000, wc  kk  mn  cd
5  1000000, aa  ll  we  sf
6  10000000, ss  aa  ss  kk

created as

options = ["ab", "cd", "bb", "aa", "we", "ss", "kk", "mn", "re", "wc", "ll", "sf"]
df = pd.DataFrame(columns=["A", "B", "C", "D"])
for i, it in enumerate([1,2,3,4,5,6,7]):
    row = [10**i, random.sample(options, 1)[0], random.sample(options, 1)[0], 
           random.sample(options, 1)[0], random.sample(options, 1)[0]]
    df.loc[i] = row

The goal is to create a scatterplot where y axis are unique values from a dataframe in sorted order e.g options and a-axis corresponds to column it. Now depending on whether data belongs to column A, B, C, or D I want to color scatter-dots differently and specify a legend. So I know what class a dot comes from.

How do I do it in seaborn or matplotlib?

The way I am doing it in matplotlib is

iters = list(range(df.shape[0]))
x, y = sort(iters, df["A"])
plt.scatter(x, y, color="red")
x, y = sort(iters, df["B"])
plt.scatter(x, y, color="blue")
...

but that does not sort the entire y-axis, only labels that belong to separate columns.

1 Answer 1

1

Let's try stack the data, convert to categorical with given order, sort and plot:

s = df.stack() 

s = pd.Series(pd.Categorical(s, categories=options, ordered=True),
              index=s.index)

sns.scatterplot(data=s.sort_values().reset_index(name='value'),
                x='level_0', y='value', hue='level_1'
               )

Output:

enter image description here


Update: if you have a column xvalue and only care for some columns ['A','B','C','D'], use melt instead of stack:

s = df.melt(id_vars='xvalue', 
            value_vars=['A','B','C','D'],
            value_name='value',
            var_name='column')
s['value'] = pd.Categorical(s['value'], categories=options, ordered=True)

sns.scatterplot(data=s.sort_values('value'),
                x='xvalue', y='value', hue='column'
               )
Sign up to request clarification or add additional context in comments.

7 Comments

But that does not look sorted! x-axis is supposed to be y-axis, but thats not a big issue. The problem is that x-axis is not sorted, at least for matplotlib one
@YohanRoth I missed the options part. See updated answer.
so your solution does not work in my case, it's my fault, I did not specify the problem correctly. in addition to A, B, C, D column I have another column that specifies x-axis values that are not just row iters (they have large gaps like 1, 10, 1000, 10000) etc. Could you show me how to modify the answer to accomodate this? Anyways I will accept it! I updated the problem
I get ValueError: Length of values (3) does not match length of index (28) for s['value'] = pd.Categorical(s, categories=options, ordered=True)
s.shape (28, 3) pd.Categorical (3,)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.