1

I have dataframe like this

bin=[0,5,10]

code sex age
a     1   1
a     1   6
b     1   8
b     2   2
c     2   3
c     1   4 

I summarized this df like

 df.groupby([df.code,df.sex,pd.cut(df.age,bin)]).size().unstack().stack().fillna(0)

I get result like below

code sex    age
a    1  (0,5] 1
a    1 (5,10] 1
a    2  (0,5] 0
a    2 (5,10] 0
b    1  (0,5] 0
b    1 (5,10] 1
b    2  (0,5] 1
b    2 (5,10] 0
c    1  (0,5] 1
c    1 (5,10] 0
c    2  (0,5] 1
c    2 (5,10] 0

I would like to transform this df to like

        1     2
        a b c  a b c
 (0,5]  1 0 1  0 1 1
(5,10]  1 0 0  0 0 0

I tried stack() or unstack() but I totally confused to transform to above dataframe. How can I transform them? some one tell me how to transform df like this process.

2
  • 1
    i can't reproduce your intermediate result based on your code. Commented Oct 26, 2017 at 3:23
  • Have you tried pivot_table instead? Commented Oct 26, 2017 at 3:42

4 Answers 4

2

You can do this with a single pivot_table:

In [11]: df
Out[11]:
  code  sex  age
0    a    1    1
1    a    1    6
2    b    1    8
3    b    2    2
4    c    2    3
5    c    1    4

In [12]: df.pivot_table(index=pd.cut(df.age, bins),
                        columns=["sex", "code"],
                        aggfunc="count",
                        fill_value=0)
Out[12]:
        age
sex       1        2
code      a  b  c  a  b  c
age
(0, 5]    1  0  1  0  1  1
(5, 10]   1  1  0  0  0  0
Sign up to request clarification or add additional context in comments.

1 Comment

Note: any time you see stack/unstack think pivot_table!
2
df.reset_index().set_index(['sex','code','age']).unstack(-1).T
Out[760]: 
sex           1        2      
code          a  b  c  a  b  c
      age                     
value (0,5]   1  0  1  0  1  1
      (5,10]  1  1  0  0  0  0

Data input :

Out[762]: 
                 value
code sex age          
a    1   (0,5]       1
         (5,10]      1
     2   (0,5]       0
         (5,10]      0
b    1   (0,5]       0
         (5,10]      1
     2   (0,5]       1
         (5,10]      0
c    1   (0,5]       1
         (5,10]      0
     2   (0,5]       1
         (5,10]      0

Or crosstab

pd.crosstab(index=pd.cut(df.age, bin),
                        columns=[df.sex, df.code])
Out[768]: 
sex      1        2   
code     a  b  c  b  c
age                   
(0, 5]   1  0  1  1  1
(5, 10]  1  1  0  0  0

3 Comments

It's all about the pivot_table (if you see stack/unstack think pivot_table)! This doesn't handle the case that the value is > 1 ... not sure if that's possible in OPs data. Edit: I take it back, that's handled in the .size() of the OPs code!
@AndyHayden adding crosstab method .
:D haha, great!
1

On the dataframe you have given, do

df.set_index(['code','sex']).unstack(['code','sex'])

In the future, please give your data in a form that allows others to run themselves, e.g. the output from df.to_records() or df.to_json().

Comments

1

You are close, only is necessary specify parameter level in unstack and last sort columns:

df = df.groupby([df.code,df.sex,pd.cut(df.age,bin)])
       .size()
       .unstack(level=[1,0])
       .sort_index(axis=1)
       .fillna(0)
print (df)
sex        1              2     
code       a    b    c    b    c
age                             
(0, 5]   1.0  0.0  1.0  1.0  1.0
(5, 10]  1.0  1.0  0.0  0.0  0.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.