1

df

           Id   timestamp               data    Date
30424   30665   2020-01-04 19:40:23.827 17.5    2020-01-04
31054   31295   2020-01-05 22:26:39.860 17.0    2020-01-05
32150   32391   2020-01-06 23:00:14.607 18.0    2020-01-06
33236   33477   2020-01-07 22:52:56.757 18.0    2020-01-07
34314   34555   2020-01-08 20:45:48.927 18.0    2020-01-08
35592   35833   2020-01-09 20:56:21.320 18.0    2020-01-09
36528   36769   2020-01-10 20:41:36.323 19.5    2020-01-10
37054   37295   2020-01-11 19:35:50.553 18.5    2020-01-11
37652   37893   2020-01-12 19:28:22.823 17.0    2020-01-12
38828   39069   2020-01-13 23:48:12.533 21.5    2020-01-13
40004   40245   2020-01-14 22:50:56.873 18.5    2020-01-14

df1

    Date        data 
0   2020-01-04  NaN
1   2020-01-07  NaN
2   2020-01-08  19.0
3   2020-01-09  NaN
4   2020-01-11  NaN
5   2020-01-12  NaN
6   2020-01-16  NaN
7   2020-01-17  NaN
8   2020-01-24  18.5

I want to replace the data in df with values in df1['data'] if the df1['data'] value is not NaN.

Expected outcome:

        Id      timestamp               data    Date
30424   30665   2020-01-04 19:40:23.827 17.5    2020-01-04
31054   31295   2020-01-05 22:26:39.860 17.0    2020-01-05
32150   32391   2020-01-06 23:00:14.607 18.0    2020-01-06
33236   33477   2020-01-07 22:52:56.757 18.0    2020-01-07
34314   34555   2020-01-08 20:45:48.927 19.0    2020-01-08  # This row changed
35592   35833   2020-01-09 20:56:21.320 18.0    2020-01-09
36528   36769   2020-01-10 20:41:36.323 19.5    2020-01-10
37054   37295   2020-01-11 19:35:50.553 18.5    2020-01-11
37652   37893   2020-01-12 19:28:22.823 17.0    2020-01-12
38828   39069   2020-01-13 23:48:12.533 21.5    2020-01-13
40004   40245   2020-01-14 22:50:56.873 18.5    2020-01-14

This answer is similar but is not the exactly same situation as my question.

I tried:

pd.merge(df, df1, how='left', on='Date')

which returned:

       Id   timestamp               data_x  Date       data_y
0   30665   2020-01-04 19:40:23.827 17.5    2020-01-04  NaN
1   31295   2020-01-05 22:26:39.860 17.0    2020-01-05  NaN
2   32391   2020-01-06 23:00:14.607 18.0    2020-01-06  NaN
3   33477   2020-01-07 22:52:56.757 18.0    2020-01-07  NaN
4   34555   2020-01-08 20:45:48.927 18.0    2020-01-08  19.0
5   35833   2020-01-09 20:56:21.320 18.0    2020-01-09  NaN
6   36769   2020-01-10 20:41:36.323 19.5    2020-01-10  NaN
7   37295   2020-01-11 19:35:50.553 18.5    2020-01-11  NaN

Update:

Tried:

df['data'] = df['Date'].map(df1.set_index('Date')['data']).fillna(df['Date'])

but it seems like something is wrong with data column:

          Id    timestamp               data            Date
30424   30665   2020-01-04 19:40:23.827 1.578096e+18    2020-01-04
31054   31295   2020-01-05 22:26:39.860 1.578182e+18    2020-01-05
32150   32391   2020-01-06 23:00:14.607 1.578269e+18    2020-01-06
33236   33477   2020-01-07 22:52:56.757 1.578355e+18    2020-01-07
34314   34555   2020-01-08 20:45:48.927 1.900000e+01    2020-01-08
35592   35833   2020-01-09 20:56:21.320 1.578528e+18    2020-01-09
36528   36769   2020-01-10 20:41:36.323 1.578614e+18    2020-01-10

1 Answer 1

1

Use Series.map by Date column first, if no match missing values, so replaced data by original by Series.fillna:

df['data'] = df['Date'].map(df1.set_index('Date')['data']).fillna(df['data'])
print (df)
          Id                timestamp  data        Date
30424  30665  2020-01-04 19:40:23.827  17.5  2020-01-04
31054  31295  2020-01-05 22:26:39.860  17.0  2020-01-05
32150  32391  2020-01-06 23:00:14.607  18.0  2020-01-06
33236  33477  2020-01-07 22:52:56.757  18.0  2020-01-07
34314  34555  2020-01-08 20:45:48.927  19.0  2020-01-08
35592  35833  2020-01-09 20:56:21.320  18.0  2020-01-09
36528  36769  2020-01-10 20:41:36.323  19.5  2020-01-10
37054  37295  2020-01-11 19:35:50.553  18.5  2020-01-11
37652  37893  2020-01-12 19:28:22.823  17.0  2020-01-12
38828  39069  2020-01-13 23:48:12.533  21.5  2020-01-13
40004  40245  2020-01-14 22:50:56.873  18.5  2020-01-14

Details:

print (df['Date'].map(df1.set_index('Date')['data']))
30424     NaN
31054     NaN
32150     NaN
33236     NaN
34314    19.0
35592     NaN
36528     NaN
37054     NaN
37652     NaN
38828     NaN
40004     NaN
Name: Date, dtype: float64
Sign up to request clarification or add additional context in comments.

5 Comments

Do you know what happened to data column in the updated question?
@nilsinelabore - Sure, there is typo, .fillna(df['Date']) need .fillna(df['data'])
@nilsinelabore - Reason is datetimes are converted to native format, unix format (like this), so repalced missing values by datetimes in this format
Why is it necessary to always set the date index (df1.set_index('Date'))? Does this mean that many functions are index based?
@nilsinelabore Reason if need matching values betwen 2 DataFrames same way. Here need matching by Dates. Function map working get values of index Series like keys of dictionary and used for assign new values. If no set_index then df1['data'] has index 0,1,2... and because this values are not in Date column get NaNs in column after (df['Date'].map(df1['data']))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.