2

Suppose I have a df like this:

    stringOfInterest        trend                  
0         C                    up
1         D                  down
2         E                  down
3         C,O                  up
4         C,P                  up

I want to plot this df as a bar graph using pandas. To obtain the proper grouped bar plots, I would like to group the data by the column df["trend"] and then count the occurrence of df["stringOfInterest"] for each letter. As can be seen, some of this strings contain multiple letters separated by a ",".

Using

df.groupby("trend").stringOfInterest.value_counts().unstack(0)

produces the expected result:

trend                  down    up
stringOfInterest                        
-                       7.0   8.0
C                       3.0  11.0
C,O                     NaN   2.0
C,P                     1.0   1.0
D                       1.0   2.0
E                      15.0  14.0
E,T                     1.0   NaN

However, I would like to count the occurrence of individual characters (C,E,D,...). On the original df this can be achieved like this:

s = df.stringOfInterest.str.split(",", expand = True).stack()
s.value_counts()

This typically generates something like this:

C     3
E     2
D     1
O     1
P     1
T     1

Unfortunately, this cannot be used here after the groupby() in combination with unstack().

Maybe I am on the wrong track and some more elegant way would be preferred.

To clarify the plotting: For each letter (stringOfInterest), there must be two bars indicating the number of "up" and "down" trend(s).

1 Answer 1

2

Based on this answer here: Pandas expand rows from list data available in column

Is this something that would help you?

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(
    {'stringOfInterest': {0: 'C', 1: 'D', 2: 'E', 3: 'C,O', 4: 'C,P'},
     'trend': {0: 'up', 1: 'down', 2: 'down', 3: 'up', 4: 'up'}})


df2 = (pd.DataFrame(df.stringOfInterest.str.split(',').tolist(), index=df.trend)
        .stack()
        .reset_index()
        .groupby('trend')[0]
        .value_counts()
        .unstack()
      ).T

df2.plot(kind='bar')
plt.show()

Another approach

We could also zip the columns together and expand.

import pandas as pd
from collections import Counter

data = [(x,i) for x,y in zip(df.trend,df.stringOfInterest.str.split(',')) for i in y]

pd.Series(Counter(data)).plot(kind='bar')
Sign up to request clarification or add additional context in comments.

2 Comments

The second is much more readable! Thanks for both. Any estimate on the performance?
@Fourier The 2nd I just came up with. However, I haven't confirmed they produce the result you are asking for.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.