1

I have two dataframes with the same date and client id, but with a different amount.

I try to get another dataframe with dfA amount value and keep the another 0's on dfB when dfA does not exist

dfA:
    client_id  date         amount
0     1        2020-07-11    100
1     1        2020-07-10    90
2     1        2020-07-09    80
3     1        2020-07-12    70
3     1        2020-07-01    86

dfB:
    client_id  date         amount
0     1        2020-07-11    0
1     1        2020-07-10    0
2     1        2020-07-09    0
3     1        2020-07-07    0
4     1        2020-07-06    0
5     1        2020-07-05    0
5     1        2020-07-04    0
3     1        2020-07-03    0
4     1        2020-07-02    0
5     1        2020-07-01    0

I want to get:

dfResult:
    client_id  date         amount
0     1        2020-07-11    100
1     1        2020-07-10    90
2     1        2020-07-09    80
3     1        2020-07-07    70
4     1        2020-07-06    0
5     1        2020-07-05    0
5     1        2020-07-04    0
3     1        2020-07-03    0
4     1        2020-07-02    0
5     1        2020-07-01    86

2 Answers 2

1

You can concat the df's together, sort by amount and then drop duplicates.

dfResult = pd.concat([dfA,dfB]).sort_values(by='amout',ascending = False).drop_duplicates(subset=['client_id','date'],keep='first').reset_index().sort_values(by=['client id','date'],ascending = (True,False))
Sign up to request clarification or add additional context in comments.

3 Comments

hello, sorry aditional TypeError: Cannot compare type 'Timestamp' with type 'date'
the dfb dataframe
date_range = pd.date_range(date_begin.date(), date_end.date()) data = pd.DataFrame([], columns=['client_id', 'date', 'amount']) data['date'] = date_range
0

try this,

(
    dfB.date.map(
        dfA.set_index('date')['amount'].to_dict()
    ).fillna(0.0)
)

Or

(
    dfB.merge(
        dfA, on=['client_id', 'date'], suffixes=("_x", ""), how='left'
    ).fillna(0.0).drop(columns=["amount_x"])
)

   client_id        date  amount
0          1  2020-07-11  100.0
1          1  2020-07-10   90.0
2          1  2020-07-09   80.0
3          1  2020-07-07    0.0
4          1  2020-07-06    0.0
5          1  2020-07-05    0.0
5          1  2020-07-04    0.0
3          1  2020-07-03    0.0
4          1  2020-07-02    0.0
5          1  2020-07-01   86.0

1 Comment

This would ignore the client_id which (I guess) is kinda important

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.