2

Afternoon All,

I have been trying to resolve this for awhile, any help would be appreciated.

Here is my dataframe:

Channel state       rfq_qty
A        Done       10
B        Tied Done  10
C        Done       10
C        Done       10
C        Done       10
C        Tied Done  10
B        Done       10
B        Done       10

I would like to:

  1. Group by channel, then state
  2. Sum the rfq_qty for each channel
  3. Count the occurences of each 'done' string in state ('Done' is treated the same as 'Tied Done' i.e. anything with 'done' in it)
  4. Display the channels rfq_qty as a percentage of the total number of rfq_qty (80)
Channel state   rfq_qty Percentage
A         1       10    0.125
B         3       30    0.375
C         4       40    0.5

I have attempted this with the following:

df_Done = df[
                (
                    df['state']=='Done'
                ) 
                | 
                (
                    df['state'] == 'Tied Done'
                )
            ][['Channel','state','rfq_qty']]

df_Done['Percentage_Qty']= df_Done['rfq_qty']/df_Done['rfq_qty'].sum()
df_Done['Done_Trades']= df_Done['state'].count()

display(
        df_Done[
                (df_Done['Channel'] != 0)
               ].groupby(['Channel'])['Channel','Count of Done','rfq_qty','Percentage_Qty'].sum().sort_values(['rfq_qty'], ascending=False)
       )

Works but looks convoluted. Any improvements?

2 Answers 2

1

I think you can use:

  • first filter by isin and loc
  • groupby and aggregate by agg with tuples of new columns names and functions
  • add Percentage by divide by div and sum
  • last if necessary sort_values by rfq_qty

df_Done = df.loc[df['state'].isin(['Done', 'Tied Done']), ['Channel','state','rfq_qty']]

#if want filter all values contains Done
#df_Done = df[df['state'].str.contains('Done')]

#if necessary filter out Channel == 0
#mask = (df['Channel'] != 0) & df['state'].isin(['Done', 'Tied Done'])
#df_Done = df.loc[mask, ['Channel','state','rfq_qty']]

d = {('rfq_qty', 'sum'), ('Done_Trades','size')}
df = df_Done.groupby('Channel')['rfq_qty'].agg(d).reset_index()
df['Percentage'] = df['rfq_qty'].div(df['rfq_qty'].sum())
df = df.sort_values('rfq_qty')
print (df)
  Channel  Done_Trades  rfq_qty  Percentage
0       A            1       10       0.125
1       B            3       30       0.375
2       C            4       40       0.500
Sign up to request clarification or add additional context in comments.

10 Comments

Hey Jezzrael. Thanks for that. When I try to sirt on the sum column it fails to sort from largest to smallest. df.sort_values(['sum'], ascending=False)
@PeterLucas - just remove , ascending=False
Perfect, case issue on column header. Thanks again!
@jpp - Hmmm, In my opinion if OP use filtering first and then working with filtered df_Done DataFrame then it is no problem.
@jpp - I agree, so added comment df_Done = df[df['state'].str.contains('Done')]
|
0

One way is to use a single df.groupby.agg and rename columns:

import pandas as pd

df = pd.DataFrame({'Channel': ['A', 'B', 'C', 'C', 'C', 'C', 'B', 'B'],
                   'state': ['Done', 'Tied Done', 'Done', 'Done', 'Done', 'Tied Done', 'Done', 'Done'],
                   'rfq_qty': [10, 10, 10, 10, 10, 10, 10, 10]})

agg_funcs = {'state': lambda x: x[x.str.contains('Done')].count(),
             'rfq_qty': ['sum', lambda x: x.sum() / df['rfq_qty'].sum()]}

res = df.groupby('Channel').agg(agg_funcs).reset_index()
res.columns = ['Channel', 'state', 'rfq_qty', 'Percentage']

#   Channel  state  rfq_qty  Percentage
# 0       A      1       10       0.125
# 1       B      3       30       0.375
# 2       C      4       40       0.500

This isn't the most efficient way, since it relies of non-vectorised aggregations, but it may be a good option if it is performant for your use case.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.