1

I have a data-frame df comprising of an identifier column and 4 columns of dates. Please see below for the head of the data-frame.

   Identifier                    A \
0       12253  1989-11-09 00:00:00   
1       11413  1990-09-03 00:00:00   
2       12337  1977-09-07 00:00:00   
3       10610  1994-08-24 00:00:00   
4        7493  1993-08-22 00:00:00   

                      B                              C  \
0   2012-03-01 00:00:00            2015-04-01 00:00:00   
1   2009-08-16 00:00:00            2015-05-18 00:00:00   
2   1977-09-07 00:00:00            1977-09-07 00:00:00   
3   2009-01-09 00:00:00            2015-03-01 00:00:00   
4   2002-06-03 00:00:00            2015-02-16 00:00:00   

                      D  \
0   2012-01-03 00:00:00   
1   2015-05-18 00:00:00   
2   1977-09-07 00:00:00   
3   2015-03-01 00:00:00   
4   2015-02-16 00:00:00 

I am trying to compare the date columns against each other and where if A>=B>=C>=D then in a new column check True should be returned else False.

So the resulting data-frame would look like:

   Identifier                    A \
0       12253  1989-11-09 00:00:00   
1       11413  1990-09-03 00:00:00   
2       12337  1977-09-07 00:00:00   
3       10610  1994-08-24 00:00:00   
4        7493  1993-08-22 00:00:00   

                      B                              C  \
0   2012-03-01 00:00:00            2015-04-01 00:00:00   
1   2009-08-16 00:00:00            2015-05-18 00:00:00   
2   1977-09-07 00:00:00            1977-09-07 00:00:00   
3   2009-01-09 00:00:00            2015-03-01 00:00:00   
4   2002-06-03 00:00:00            2015-02-16 00:00:00   

                      D       Check\
0   2012-01-03 00:00:00       False
1   2015-05-18 00:00:00       True
2   1977-09-07 00:00:00       True 
3   2015-03-01 00:00:00       True
4   2015-02-16 00:00:00       True

I have tried

df['Check'] = np.where(df['A'] >= df['B']>= df['C']>= df['D'], 'True', 'False')

But can't get this to work. What can I try next?

1
  • Try df['Check'] = df['A'] >= df['B'] >= df['C'] Commented Oct 18, 2018 at 9:50

3 Answers 3

2

Use DataFrame.eval:

df['Check'] = df.eval('A <= B <= C <= D')
print (df)
   Identifier          A          B          C          D  Check
0       12253 1989-11-09 2012-03-01 2015-04-01 2012-01-03  False
1       11413 1990-09-03 2009-08-16 2015-05-18 2015-05-18   True
2       12337 1977-09-07 1977-09-07 1977-09-07 1977-09-07   True
3       10610 1994-08-24 2009-01-09 2015-03-01 2015-03-01   True
4        7493 1993-08-22 2002-06-03 2015-02-16 2015-02-16   True
Sign up to request clarification or add additional context in comments.

12 Comments

@AntonvBR - I think diff shoud be more slowier in my opinion, also it compare datetimes, not integers
@AntonvBR - I get %timeit (df.eval('A <= B <= C <= D')) 3.92 ms ± 487 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit ((df.diff(axis=1) >= pd.Timedelta(0)).all(1)) 30.7 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
@AntonvBR - so it is data related ? because it is huge difference and I cannot see reason for it.
Removed comments because of overflow. Sorry but why would it be such a big difference?
@AntonvBR - I have no idea, sure. I try test your data.
|
1

Use df.diff(), similar to np.diff() and check if any element is lower than pd.Timedelta(0).

df['check'] = ~df.diff(axis=1).lt(pd.Timedelta(0)).any(1)

Full example:

import pandas as pd
import numpy

np.random.seed(333)

# Random dates from:  https://stackoverflow.com/questions/50559078/
def pp(start, end, n):
    start_u = start.value//10**9
    end_u = end.value//10**9
    return pd.DatetimeIndex((10**9*np.random.randint(start_u, end_u, n)).view('M8[ns]'))

n = 10

df = pd.DataFrame({
    'A': pp(pd.Timestamp('2018'), pd.Timestamp('2019'), n),
    'B': pp(pd.Timestamp('2018'), pd.Timestamp('2019'), n),
    'C': pp(pd.Timestamp('2018'), pd.Timestamp('2019'), n),
    'D': pp(pd.Timestamp('2018'), pd.Timestamp('2019'), n)
})

df['check'] = ~df.diff(axis=1).lt(pd.Timedelta(0)).any(1)
print(df)

Returns:

                    A                   B                   C  \
0 2018-07-30 04:54:04 2018-03-13 00:28:13 2018-08-24 11:01:29   
1 2018-12-26 21:22:20 2018-09-23 14:25:11 2018-08-19 07:21:59   
2 2018-04-29 17:15:57 2018-05-28 12:35:35 2018-10-16 00:19:11   
3 2018-12-11 06:56:35 2018-08-15 00:12:12 2018-08-05 23:47:08   
4 2018-03-04 11:00:03 2018-07-03 07:22:30 2018-09-09 01:45:09   
5 2018-08-22 03:24:30 2018-12-17 17:38:34 2018-01-29 13:02:29   
6 2018-04-21 01:10:14 2018-06-09 20:37:08 2018-04-30 12:30:00   
7 2018-06-27 18:40:46 2018-09-15 10:26:06 2018-05-13 03:51:36   
8 2018-03-18 06:31:24 2018-11-10 06:24:12 2018-02-25 02:58:15   
9 2018-11-08 17:52:19 2018-03-27 01:02:12 2018-03-06 00:10:02   

                    D  check  
0 2018-07-30 16:16:03  False  
1 2018-07-21 23:38:59  False  
2 2018-10-25 03:46:37   True  
3 2018-12-01 07:43:53  False  
4 2018-12-07 16:11:31   True  
5 2018-09-17 14:58:20  False  
6 2018-07-02 09:36:35  False  
7 2018-03-16 23:21:27  False  
8 2018-10-30 11:24:01  False  
9 2018-04-03 12:17:52  False

Benchmarking

%timeit ~df.diff(axis=1).lt(pd.Timedelta(0)).any(1)
%timeit df.eval('A <= B <= C <= D')    

10.000 rows:

#1000 loops, best of 3: 1.58 ms per loop
#100 loops, best of 3: 3.31 ms per loop

10.000.000 rows:

#1 loop, best of 3: 2.27 s per loop
#1 loop, best of 3: 243 ms per loop

Comments

0

There might be a smarter way, but this should work

df['Check'] = (df['A'] <= df['B']) & (df['B']<= df['C']) & (df['C'] <= df['D'])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.