2

Consider a data frame which looks like:

             A        B        C
0   2018-10-13      100       50
1   2018-10-13      200       25
2   2018-10-13      300       10
3   2018-10-13      400        5
4   2018-10-13      500        0
5   2018-10-14      100      100
6   2018-10-14      200       50
7   2018-10-14      300       25
8   2018-10-14      400       10
9   2018-10-14      500        5
10  2018-10-15      100      150
11  2018-10-15      200      100
12  2018-10-15      300       50
13  2018-10-15      400       25
14  2018-10-15      500       10

Here transformation that I want to perform is:

  1. GroupBy Column A
  2. Then GroupBy Column B into 3 intervals ( [0,100] say intval-1, [101,200] say intval-2, [201,end] say intval-3]. Can be n intervals to generalize.
  3. Perform sum aggregation on Column C

So my transformed/pivoted dataframe should be like

             A  intval-1  intval-2  intval-3
0   2018-10-13        50        25        15
1   2018-10-14       100        50        40
2   2018-10-13       150       100        85

An easy way to implement this would be great help.

Thank You.

2
  • df.pivot(index='A',columns='B',values='C') Commented Oct 30, 2018 at 15:58
  • pivot won't work because you can't supply an aggregation function. Commented Oct 30, 2018 at 15:59

2 Answers 2

3

You can cut, then pivot_table:

bin_lst = [0, 100, 200, np.inf]

cut_b = pd.cut(df['B'], bins=bin_lst,
               labels=[f'intval-{i}' for i in range(1, len(bin_lst))])

res = df.assign(B=cut_b)\
        .pivot_table(index='A', columns='B', values='C', aggfunc='sum')

print(res)

B           intval-1  intval-2  intval-3
A                                       
2018-10-13        50        25        15
2018-10-14       100        50        40
2018-10-15       150       100        85
Sign up to request clarification or add additional context in comments.

1 Comment

@Wen, Yep, I thought I'd go the "extra step" to type out labels :). But think better to derive from length.
3

Using pd.cut with groupby + unstack

df.B=pd.cut(df.B,bins=[0,100,200,np.inf],labels=['intval-1','intval-2','intval-3'])
df.groupby(['A','B']).C.sum().unstack()
Out[35]: 
B           intval-1  intval-2  intval-3
A                                       
2018-10-13        50        25        15
2018-10-14       100        50        40
2018-10-15       150       100        85

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.