0

I have 2 dataframes:

df1

   date               event    group    failure
2018-04-19 02:07:00     1       E1         0
2018-04-19 02:07:00     2       E2         1

df2:

        start_time                   end_time           group      failure
2018-04-01 00:00:00+01:00   2018-04-01 23:59:59+01:00     E1         1
2018-04-27 19:00:00+01:00   2018-04-27 21:29:59+01:00     E1         1
2018-04-27 06:00:00+01:00   2018-04-27 12:59:59+01:00     E1         1
2018-04-26 19:00:00+01:00   2018-04-26 21:29:59+01:00     E1         1
2018-04-26 06:00:00+01:00   2018-04-26 12:59:59+01:00     E1         1
2018-04-25 19:00:00+01:00   2018-04-25 21:29:59+01:00     E1         1
2018-04-25 06:00:00+01:00   2018-04-25 12:59:59+01:00     E1         1
2018-04-24 19:00:00+01:00   2018-04-24 21:29:59+01:00     E1         1
2018-04-24 06:00:00+01:00   2018-04-24 12:59:59+01:00     E1         1
2018-04-23 19:00:00+01:00   2018-04-23 21:29:59+01:00     E1         1
2018-04-23 06:00:00+01:00   2018-04-23 12:59:59+01:00     E1         1
2018-04-16 00:00:00+01:00   2018-04-22 23:59:59+01:00     E1         1
2018-04-28 00:00:00+01:00   2018-04-29 23:59:59+01:00     E1         1
2018-04-07 00:00:00+01:00   2018-04-08 23:59:59+01:00     E1         1
2018-04-06 19:00:00+01:00   2018-04-06 21:29:59+01:00     E1         1
2018-04-06 06:00:00+01:00   2018-04-06 12:59:59+01:00     E1         1
2018-04-09 00:00:00+01:00   2018-04-15 23:59:59+01:00     E1         1
2018-04-05 19:00:00+01:00   2018-04-05 21:29:59+01:00     E1         1
2018-04-04 06:00:00+01:00   2018-04-04 12:59:59+01:00     E1         1
2018-04-03 06:00:00+01:00   2018-04-03 12:59:59+01:00     E1         1
2018-04-02 00:00:00+01:00   2018-04-02 23:59:59+01:00     E1         1
2018-04-04 19:00:00+01:00   2018-04-04 21:29:59+01:00     E1         1
2018-04-05 06:00:00+01:00   2018-04-05 12:59:59+01:00     E1         1
2018-04-03 19:00:00+01:00   2018-04-03 21:29:59+01:00     E1         1
2018-04-27 06:00:00+01:00   2018-04-27 12:59:59+01:00     E2         1
2018-04-02 00:00:00+01:00   2018-04-02 23:59:59+01:00     E2         1
2018-04-26 19:00:00+01:00   2018-04-26 21:29:59+01:00     E2         1
2018-04-25 06:00:00+01:00   2018-04-25 12:59:59+01:00     E2         1
2018-04-03 06:00:00+01:00   2018-04-03 12:59:59+01:00     E2         1
2018-04-26 06:00:00+01:00   2018-04-26 12:59:59+01:00     E2         1
2018-04-27 19:00:00+01:00   2018-04-27 21:29:59+01:00     E2         1
2018-04-01 00:00:00+01:00   2018-04-01 23:59:59+01:00     E2         1
2018-04-25 19:00:00+01:00   2018-04-25 21:29:59+01:00     E2         1
2018-04-03 19:00:00+01:00   2018-04-03 21:29:59+01:00     E2         1
2018-04-24 19:00:00+01:00   2018-04-24 21:29:59+01:00     E2         1
2018-04-04 06:00:00+01:00   2018-04-04 12:59:59+01:00     E2         1
2018-04-24 06:00:00+01:00   2018-04-24 12:59:59+01:00     E2         1
2018-04-23 19:00:00+01:00   2018-04-23 21:29:59+01:00     E2         1
2018-04-04 19:00:00+01:00   2018-04-04 21:29:59+01:00     E2         1
2018-04-23 06:00:00+01:00   2018-04-23 12:59:59+01:00     E2         1
2018-04-16 00:00:00+01:00   2018-04-22 23:59:59+01:00     E2         1
2018-04-05 06:00:00+01:00   2018-04-05 12:59:59+01:00     E2         1
2018-04-09 00:00:00+01:00   2018-04-15 23:59:59+01:00     E2         1
2018-04-07 00:00:00+01:00   2018-04-08 23:59:59+01:00     E2         1
2018-04-05 19:00:00+01:00   2018-04-05 21:29:59+01:00     E2         1
2018-04-06 19:00:00+01:00   2018-04-06 21:29:59+01:00     E2         1
2018-04-06 06:00:00+01:00   2018-04-06 12:59:59+01:00     E2         1
2018-04-28 00:00:00+01:00   2018-04-29 23:59:59+01:00     E2         1

I have to check if:

  • df1(date) is between df2(start_time) and df2(end_time)

  • df1(group)=df2(group)

then replace df2(failure) with df1(failure). The desired outcome looks like:

        start_time                   end_time           group      failure
2018-04-01 00:00:00+01:00   2018-04-01 23:59:59+01:00     E1         1
2018-04-27 19:00:00+01:00   2018-04-27 21:29:59+01:00     E1         1
2018-04-27 06:00:00+01:00   2018-04-27 12:59:59+01:00     E1         1
2018-04-26 19:00:00+01:00   2018-04-26 21:29:59+01:00     E1         1
2018-04-26 06:00:00+01:00   2018-04-26 12:59:59+01:00     E1         1
2018-04-25 19:00:00+01:00   2018-04-25 21:29:59+01:00     E1         1
2018-04-25 06:00:00+01:00   2018-04-25 12:59:59+01:00     E1         1
2018-04-24 19:00:00+01:00   2018-04-24 21:29:59+01:00     E1         1
2018-04-24 06:00:00+01:00   2018-04-24 12:59:59+01:00     E1         1
2018-04-23 19:00:00+01:00   2018-04-23 21:29:59+01:00     E1         1
2018-04-23 06:00:00+01:00   2018-04-23 12:59:59+01:00     E1         1
2018-04-16 00:00:00+01:00   2018-04-22 23:59:59+01:00     E1         0
2018-04-28 00:00:00+01:00   2018-04-29 23:59:59+01:00     E1         1
2018-04-07 00:00:00+01:00   2018-04-08 23:59:59+01:00     E1         1
2018-04-06 19:00:00+01:00   2018-04-06 21:29:59+01:00     E1         1
2018-04-06 06:00:00+01:00   2018-04-06 12:59:59+01:00     E1         1
2018-04-09 00:00:00+01:00   2018-04-15 23:59:59+01:00     E1         1
2018-04-05 19:00:00+01:00   2018-04-05 21:29:59+01:00     E1         1
2018-04-04 06:00:00+01:00   2018-04-04 12:59:59+01:00     E1         1
2018-04-03 06:00:00+01:00   2018-04-03 12:59:59+01:00     E1         1
2018-04-02 00:00:00+01:00   2018-04-02 23:59:59+01:00     E1         1
2018-04-04 19:00:00+01:00   2018-04-04 21:29:59+01:00     E1         1
2018-04-05 06:00:00+01:00   2018-04-05 12:59:59+01:00     E1         1
2018-04-03 19:00:00+01:00   2018-04-03 21:29:59+01:00     E1         1
2018-04-27 06:00:00+01:00   2018-04-27 12:59:59+01:00     E2         1
2018-04-02 00:00:00+01:00   2018-04-02 23:59:59+01:00     E2         1
2018-04-26 19:00:00+01:00   2018-04-26 21:29:59+01:00     E2         1
2018-04-25 06:00:00+01:00   2018-04-25 12:59:59+01:00     E2         1
2018-04-03 06:00:00+01:00   2018-04-03 12:59:59+01:00     E2         1
2018-04-26 06:00:00+01:00   2018-04-26 12:59:59+01:00     E2         1
2018-04-27 19:00:00+01:00   2018-04-27 21:29:59+01:00     E2         1
2018-04-01 00:00:00+01:00   2018-04-01 23:59:59+01:00     E2         1
2018-04-25 19:00:00+01:00   2018-04-25 21:29:59+01:00     E2         1
2018-04-03 19:00:00+01:00   2018-04-03 21:29:59+01:00     E2         1
2018-04-24 19:00:00+01:00   2018-04-24 21:29:59+01:00     E2         1
2018-04-04 06:00:00+01:00   2018-04-04 12:59:59+01:00     E2         1
2018-04-24 06:00:00+01:00   2018-04-24 12:59:59+01:00     E2         1
2018-04-23 19:00:00+01:00   2018-04-23 21:29:59+01:00     E2         1
2018-04-04 19:00:00+01:00   2018-04-04 21:29:59+01:00     E2         1
2018-04-23 06:00:00+01:00   2018-04-23 12:59:59+01:00     E2         1
2018-04-16 00:00:00+01:00   2018-04-22 23:59:59+01:00     E2         1
2018-04-05 06:00:00+01:00   2018-04-05 12:59:59+01:00     E2         1
2018-04-09 00:00:00+01:00   2018-04-15 23:59:59+01:00     E2         1
2018-04-07 00:00:00+01:00   2018-04-08 23:59:59+01:00     E2         1
2018-04-05 19:00:00+01:00   2018-04-05 21:29:59+01:00     E2         1
2018-04-06 19:00:00+01:00   2018-04-06 21:29:59+01:00     E2         1
2018-04-06 06:00:00+01:00   2018-04-06 12:59:59+01:00     E2         1
2018-04-28 00:00:00+01:00   2018-04-29 23:59:59+01:00     E2         1

I have tried with if functions, but I get the error: Can only compare identically-labeled Series objects. Any suggestion? Thank you in advance!

7
  • try: df1[~df1.date.isin(df2.start_time.values)] at least this give you the idea if it matches the records you are looking in.. Commented Oct 19, 2018 at 12:06
  • I don't get any error, but it doesn't change the value! Commented Oct 19, 2018 at 12:09
  • are both of your columns datetime objects? the error you receive might indicate that one of them is a string and the other one is a datetime Commented Oct 19, 2018 at 12:11
  • Yeah, it not change value just checking if the values from date columns are matching anywhere in the df2 start_time columns. replace logic still to implement.. Commented Oct 19, 2018 at 12:11
  • @DanielR. he is not getting any error :-) Commented Oct 19, 2018 at 12:12

1 Answer 1

3

I could compare the dates after doing the following:-

e1['date'] = e1['date'].apply( lambda x: pd.to_datetime(x).tz_localize('US/Eastern'))
e2['start_time'] = e2['start_time'].apply( lambda x: 
pd.to_datetime(x).tz_localize('US/Eastern'))
e2['end_time'] = e2['end_time'].apply( lambda x: pd.to_datetime(x).tz_localize('US/Eastern'))

I merged both tables and then checked if date is between start time and end time to replace failure variable.

failure_x is of E2 while failure_y is of E1 dataframes:-

df = e2.merge(e1,on='group',how='left')
df['failure_x'] = np.where((df['start_time'] <= df['date']) & (df['date'] <=  df['end_time']), df['failure_y'], df['failure_x'])
Sign up to request clarification or add additional context in comments.

1 Comment

haven't you mistaken a bit with all those "df"? I mean, which should be df, df1 or df2

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.