How to calculate time difference between specific row values in dataframe using python?

Question

The df looks like below:


Time                    A 

2019-05-18 01:15:28     7
2019-05-18 01:28:11     7
2019-05-18 01:36:36     12
2019-05-18 01:39:47     12
2019-05-18 01:53:32     12
2019-05-18 02:05:37     7

I understand how to calculate consecutive row time difference. But I want to calculate the time difference whenever the Value in A is 7 to 12.

Expected output:


Time                    A   Time_difference

2019-05-18 01:15:28     7   0
2019-05-18 01:28:11     7   0
2019-05-18 01:36:36     12  00:21:08
2019-05-18 01:39:47     12  0
2019-05-18 01:53:32     12  0
2019-05-18 02:05:37     12  0

Is it a specific calculation that you need, or is there any rule regarding what rows to substract? Generally speaking, there is the timedelta object for time calculations. — Aryerez
– Aryerez, Commented Sep 25, 2019 at 8:44
yes specific rule, like the window time between the occurence of 7 to 12 in column A. — hakuna_code
– hakuna_code, Commented Sep 25, 2019 at 8:45
there is possible multiple 7 or 12 value between 7 and 12 ? — jezrael
– jezrael, Commented Sep 25, 2019 at 8:45
yes. But the window will be between the first 7 to the first 12. Example: [7,4,7,7,12] 7 to 12 is the focus. — hakuna_code
– hakuna_code, Commented Sep 25, 2019 at 8:50
Can you add more 7 and 12 values for minimal, complete, and verifiable example? I think also if more consecutive 7 or 12 or both. — jezrael
– jezrael, Commented Sep 25, 2019 at 8:59

Ollie in PGH · Accepted Answer · 2019-09-25 09:21:48Z

2

You can isolate any values in dataframes using loc. What gets returned is a Series, which can be indexed like a list. Use [0] to get the first occurrence in the Series.

times = [
    '2019-05-18 01:15:28',
    '2019-05-18 01:28:11',
    '2019-05-18 01:36:36',
    '2019-05-18 01:39:47',
    '2019-05-18 01:53:32',
    '2019-05-18 02:05:37'
]

a = [9, 7, 7, 5, 12, 12]

df = pd.DataFrame({'times':times, 'a':a})
df.times = pd.to_datetime(df['times'])
pd.Timedelta(df.loc[df.a == 12, 'times'].values[0] - df.loc[df.a == 7, 'times'].values[0])

Timedelta('0 days 00:25:21')

Or we can break that code apart for readability's sake and do the calculations on new variables:

times = [
    '2019-05-18 01:15:28',
    '2019-05-18 01:28:11',
    '2019-05-18 01:36:36',
    '2019-05-18 01:39:47',
    '2019-05-18 01:53:32',
    '2019-05-18 02:05:37'
]

a = [9, 7, 7, 5, 12, 12]

df = pd.DataFrame({'times':times, 'a':a})
df.times = pd.to_datetime(df['times'])
end = df.loc[df.a == 12, 'times'].values[0]
start = df.loc[df.a == 7, 'times'].values[0]
pd.Timedelta(end - start)

Timedelta('0 days 00:25:21')

edited Sep 25, 2019 at 9:21

answered Sep 25, 2019 at 9:15

Ollie in PGH

2,6392 gold badges18 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

hakuna_code Over a year ago

Thank you! But it doesnot iterate through the dataframe to find another window of 7 to 12. Like i put in the expected output, whenever there is a window 7 to 12 in the dataframe the difference should be calculated. I will tweak a little more on your logic and check...

Ollie in PGH Over a year ago

I don't think that's what your expected output reflects. Wouldn't there be another time difference if you want it to keep going? In one comment you specifically said you want the first occurrences.

jezrael · Accepted Answer · 2019-09-25 11:08:42Z

Sample:

times = [
    '2019-05-18 01:15:28',
    '2019-05-18 01:28:11',
    '2019-05-18 01:36:36',
    '2019-05-18 01:39:47',
    '2019-05-18 01:53:32',
    '2019-05-18 02:05:37'
]

a = [7, 7, 12, 7, 12, 7]

df = pd.DataFrame({'times': pd.to_datetime(times), 'A':a})
print (df)
                times   A
0 2019-05-18 01:15:28   7
1 2019-05-18 01:28:11   7
2 2019-05-18 01:36:36  12
3 2019-05-18 01:39:47   7
4 2019-05-18 01:53:32  12
5 2019-05-18 02:05:37   7

First create default index and filter rows with 7 and 12 only:

df = df.reset_index(drop=True)
df1 = df[df['A'].isin([7, 12])]

Then get first consecutive values in rows with compare with shifted values:

df1 = df1[df1['A'].ne(df1['A'].shift())]
print (df1)
                times   A
0 2019-05-18 01:15:28   7
2 2019-05-18 01:36:36  12
3 2019-05-18 01:39:47   7
4 2019-05-18 01:53:32  12
5 2019-05-18 02:05:37   7

Then filter 7 with next 12 rows:

m1 = df1['A'].eq(7) & df1['A'].shift(-1).eq(12)
m2 = df1['A'].eq(12) & df1['A'].shift().eq(7)

df2 = df1[m1 | m2]
print (df2)
                times   A
0 2019-05-18 01:15:28   7
2 2019-05-18 01:36:36  12
3 2019-05-18 01:39:47   7
4 2019-05-18 01:53:32  12

Get datetimes with pair and unpairs rows:

out7 = df2.iloc[::2]
out12 = df2.iloc[1::2]

And last subtract:

df['Time_difference'] = out12['times'] - out7['times'].to_numpy()
df['Time_difference'] = df['Time_difference'].fillna(pd.Timedelta(0))
print (df)
                times   A Time_difference
0 2019-05-18 01:15:28   7        00:00:00
1 2019-05-18 01:28:11   7        00:00:00
2 2019-05-18 01:36:36  12        00:21:08
3 2019-05-18 01:39:47   7        00:00:00
4 2019-05-18 01:53:32  12        00:13:45
5 2019-05-18 02:05:37   7        00:00:00

Dev Khadka · Accepted Answer · 2019-09-25 11:35:07Z

Explanation:

(df["A"] == 7).cumsum() separates rows to each 7
for each group of 7, if there is 12 the substract the 1st row with 12 from 1st row of group
If not pass value of 1st row of group to next group until 12 is found


import pandas as pd
import numpy as np

np.random.seed(10)
date_range = pd.date_range("25-9-2019", "27-9-2019", freq="3H")
df = pd.DataFrame({'Time':date_range, 'A':np.random.choice([5,7,12], len(date_range))})

df["Seven"] = (df["A"] == 7).cumsum()

# display(df)
pass_to_next_group = {"val": None}
def diff(group):
    group["Diff"]=0
    loc = group.index[group["A"]==12]

    time_a = pass_to_next_group["val"] if pass_to_next_group["val"] else group["Time"].iloc[0]
    pass_to_next_group["val"] = None

    if group.name>0 and len(loc)>0:           
        group.loc[loc[0],"Diff"] =  time_a-group.loc[loc[0],"Time"]
    else:
        pass_to_next_group["val"] = time_a

    return group


df.groupby("Seven").apply(diff)

Collectives™ on Stack Overflow

How to calculate time difference between specific row values in dataframe using python?

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related