1

This is a little similar to this answer but my problem is a little different.

My dataframe:

df=pd.DataFrame([{'date': '2017-01-01', 'id': 'google', 'category': 1, 'Score1': 51, 'Score2': 531},
         {'date': '2017-01-01', 'id': 'google', 'category': 2, 'Score1': 592, 'Score2': 152},
         {'date': '2017-01-01', 'id': 'google', 'category': 5, 'Score1': 55, 'Score2': 255},
         {'date': '2017-01-01', 'id': 'yahoo', 'category': 7, 'Score1': 597, 'Score2': 357},
         {'date': '2017-01-01', 'id': 'yahoo', 'category': 8, 'Score1': 58, 'Score2': 58},
         {'date': '2017-01-02', 'id': 'google', 'category': 5, 'Score1': 795, 'Score2': 455},
         {'date': '2017-01-02', 'id': 'google', 'category': 1, 'Score1': 71, 'Score2': 751},
         {'date': '2017-01-02', 'id': 'google', 'category': 2, 'Score1': 792, 'Score2': 352},
         {'date': '2017-01-02', 'id': 'yahoo', 'category': 7, 'Score1': 77, 'Score2': 957},
         {'date': '2017-01-02', 'id': 'yahoo', 'category': 8, 'Score1': 798, 'Score2': 358}
       ])

And looks like this:

         date      id  category  Score1  Score2
0  2017-01-01  google         1      51     531
1  2017-01-01  google         2     592     152
2  2017-01-01  google         5      55     255
3  2017-01-01   yahoo         7     597     357
4  2017-01-01   yahoo         8      58      58
5  2017-01-02  google         5     795     455
6  2017-01-02  google         1      71     751
7  2017-01-02  google         2     792     352
8  2017-01-02   yahoo         7      77     957
9  2017-01-02   yahoo         8     798     358

I need to transform this into a dataframe that looks like this:

date    id  cat1_score1 cat2_score1 cat5_score1 cat7_score1 cat8_score1 cat1_score2 cat2_score2 cat5_score2 cat7_score2 cat8_score2
1/1/17  google    51        592         55           0            0            531       152        255           0         0
1/1/17  yahoo      0        0            0          597          58             0         0          0           357        58
1/2/17  google    71        792        795           0            0            751       352        455           0         0
1/2/17  yahoo      0         0           0          77           798            0         0          0           957        358

The caveat here is that the number of categories can vary from id to id. There could also potentially be a secondary ID column which would need to be considered. I could enumerate over the values in the id column but then how would I transform the dataframe accordingly?

1 Answer 1

3

Let's reshape like this:

df_out = df.set_index(['date', 'id', 'category']).stack().unstack([2, 3]).fillna(0)
df_out.columns = df_out.columns.map('{0[0]}_{0[1]}'.format)
df_out = df_out.add_prefix('cat').reset_index()
print(df_out)

Output:

         date      id  cat1_Score1  cat1_Score2  cat2_Score1  cat2_Score2  \
0  2017-01-01  google         51.0        531.0        592.0        152.0   
1  2017-01-01   yahoo          0.0          0.0          0.0          0.0   
2  2017-01-02  google         71.0        751.0        792.0        352.0   
3  2017-01-02   yahoo          0.0          0.0          0.0          0.0   

   cat5_Score1  cat5_Score2  cat7_Score1  cat7_Score2  cat8_Score1  \
0         55.0        255.0          0.0          0.0          0.0   
1          0.0          0.0        597.0        357.0         58.0   
2        795.0        455.0          0.0          0.0          0.0   
3          0.0          0.0         77.0        957.0        798.0   

   cat8_Score2  
0          0.0  
1         58.0  
2          0.0  
3        358.0 

Explanation: First set_index on everything but scoring columns, then use stack to align all the scores vertically. Next, use unstack to pivot category and score labels to columns and fillna with zeroes. Use columns map to combine column levels into one level and add_prefix 'cat' to columns. Lastly, reset_index get all columns back into the dataframe from index.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @Scott - this appears to have removed date and id from the dataframe columns though
You can reset_index a the end to get those back into the dataframe.
@Craig Thank you, and you're welcome. Happy coding!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.