2

Let's say I have two DataFrames as below :

DF1:

from datetime import date, timedelta
import pandas as pd
import numpy as np
sdate = date(2019,1,1)   # start date
edate = date(2019,1,7)   # end date

required_dates = pd.date_range(sdate,edate-timedelta(days=1),freq='d')
# initialize list of lists 
data = [['2019-01-01', 1001], ['2019-01-03', 1121] ,['2019-01-02', 1500], 
        ['2019-01-02', 1400],['2019-01-04', 1501],['2019-01-01', 1200],
        ['2019-01-04', 1201],['2019-01-04', 1551],['2019-01-05', 1400]]
# Create the pandas DataFrame 
df1 = pd.DataFrame(data, columns = ['OnlyDate', 'TBID']) 
df1.sort_values(by='OnlyDate',inplace=True)
df1     
     OnlyDate   TBID
0   2019-01-01  1001
5   2019-01-01  1200
2   2019-01-02  1500
3   2019-01-02  1400
1   2019-01-03  1121
4   2019-01-04  1501
6   2019-01-04  1201
7   2019-01-04  1551
8   2019-01-05  1400

DF2 :

df2=pd.DataFrame(columns=[sorted(df1['TBID'].unique())],index=required_dates)
df2     
            1001    1121    1200    1201    1400    1500    1501    1551
2019-01-01  NaN     NaN     NaN     NaN     NaN     NaN     NaN      NaN
2019-01-02  NaN     NaN     NaN     NaN     NaN     NaN     NaN      NaN
2019-01-03  NaN     NaN     NaN     NaN     NaN     NaN     NaN      NaN
2019-01-04  NaN     NaN     NaN     NaN     NaN     NaN     NaN      NaN
2019-01-05  NaN     NaN     NaN     NaN     NaN     NaN     NaN      NaN
2019-01-06  NaN     NaN     NaN     NaN     NaN     NaN     NaN      NaN

What I am trying is to apply (True or 1 ) to this DF3 Dataframe w.r.t to the values from df1 like below output:

df3 =df2.copy()
for index, row in df1.iterrows():
    df3.loc[row['OnlyDate'],row['TBID']] = 1

df3.fillna(0, inplace=True)
df3 


            1001    1121    1200    1201    1400    1500    1501    1551
2019-01-01   1       0       1       0       0       0       0       0
2019-01-02   0       0       0       0       1       1       0       0
2019-01-03   0       1       0       0       0       0       0       0
2019-01-04   0       0       0       1       0       0       1       1
2019-01-05   0       0       0       0       1       0       0       0
2019-01-06   0       0       0       0       0       0       0       0

Is there any better way for doing this?

1 Answer 1

1

Use get_dummies with max for indicators (always 0, 1) or sum if want count values:

df = pd.get_dummies(df1.set_index('OnlyDate')['TBID']).max(level=0)
print (df)
            1001  1121  1200  1201  1400  1500  1501  1551
OnlyDate                                                  
2019-01-01     1     0     1     0     0     0     0     0
2019-01-02     0     0     0     0     1     1     0     0
2019-01-03     0     1     0     0     0     0     0     0
2019-01-04     0     0     0     1     0     0     1     1
2019-01-05     0     0     0     0     1     0     0     0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.