Transpose column in a DataFrame into a binary matrix

Question

Context

Lets say I have a pandas-DataFrame like this:

>>> data.head()
                            values  atTime
date        
2006-07-01 00:00:00+02:00   15.10   0000
2006-07-01 00:15:00+02:00   16.10   0015
2006-07-01 00:30:00+02:00   17.75   0030
2006-07-01 00:45:00+02:00   17.35   0045
2006-07-01 01:00:00+02:00   17.25   0100

atTime represents the hour and minute of the timestamp used as index. I want to transpose the atTime-column to a binary matrix (making it sparse is also an option), which will be used as nominal feature in a machine learning approach.

The desired result should look like:

>>> data.head()
                            values  0000  0015  0030  0045  0000
date        
2006-07-01 00:00:00+02:00   15.10   1     0     0     0     0
2006-07-01 00:15:00+02:00   16.10   0     1     0     0     0
2006-07-01 00:30:00+02:00   17.75   0     0     1     0     0
2006-07-01 00:45:00+02:00   17.35   0     0     0     1     0
2006-07-01 01:00:00+02:00   17.25   0     0     0     0     1

As might be anticipated, this matrix will be much larger when concidering all values in atTime.

My question

I can achieve the desired result with workarounds using apply and using the timestamps in order to create the new columns beforehand.

However, is there a build-in option in pandas (or via numpy, concidering atTime as numpy-array) to achieve the same without a workaround?

cs95 · Accepted Answer · 2020-08-20 17:28:01Z

8

This is a use case for get_dummies:

pd.get_dummies(df, columns=["atTime"])

                           values  atTime_0  atTime_15  atTime_30  atTime_45  atTime_100
date                                                                                    
2006-07-01 00:00:00+02:00   15.10         1          0          0          0           0
2006-07-01 00:15:00+02:00   16.10         0          1          0          0           0
2006-07-01 00:30:00+02:00   17.75         0          0          1          0           0
2006-07-01 00:45:00+02:00   17.35         0          0          0          1           0
2006-07-01 01:00:00+02:00   17.25         0          0          0          0           1

Solution updated with OP's recommendation. Thanks!

edited Aug 20, 2020 at 17:28

answered Jun 19, 2019 at 16:25

cs95

406k106 gold badges745 silver badges798 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Markus Over a year ago

Oh.. I never thougth that something like this would be behind such a name.. Thank you very much for the fast reply!

Markus Over a year ago

Just to add since I needed this again: With the current pandas-package you can run pd.get_dummies(data, columns=["atTime"]) as well with the same result and saving the join :)

Collectives™ on Stack Overflow

Transpose column in a DataFrame into a binary matrix

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related