3

Im trying to create a column that gives the variance or subtraction of two timestamps of two other columns.

def time_delta(df):
    if df['a_time'] > df['b_time']:
        df = (pd.to_datetime(df.a_time) - pd.to_datetime(df.b_time)) / np.timedelta64(1, 'm')
    else:
        df = (pd.to_datetime(df.b_time) - pd.to_datetime(df.a_time)) / np.timedelta64(1, 'm')
    return df

df['C'] = df.apply(time_delta, axis=1)

When I run the apply part of code the cell just keeps running with *, am I missing something?

Thanks so much

2 Answers 2

1

Don't assign result to "df", change it to different variable instead.

def time_delta(df):
    if df['a_time'] > df['b_time']:
        res = (pd.to_datetime(df.a_time) - pd.to_datetime(df.b_time)) / np.timedelta64(1, 'm')
    else:
        res = (pd.to_datetime(df.b_time) - pd.to_datetime(df.a_time)) / np.timedelta64(1, 'm')
    return res
Sign up to request clarification or add additional context in comments.

Comments

0

Your logic is over-complicated. Row-wise loops, which is what pd.DataFrame.apply represents, should be actively avoided with Pandas. Here, you can convert a timedelta series to seconds, then take the absolute value:

df = pd.DataFrame({'a_time': pd.to_datetime(['2018-01-01 05:32:00', '2018-05-10 20:13:41']),
                   'b_time': pd.to_datetime(['2018-01-01 15:10:05', '2018-05-10 16:09:16'])})

df['C'] = (df['b_time'] - df['a_time']).dt.total_seconds().abs() / 60

print(df)

               a_time              b_time           C
0 2018-01-01 05:32:00 2018-01-01 15:10:05  578.083333
1 2018-05-10 20:13:41 2018-05-10 16:09:16  244.416667

For academic purposes, this is how you would use apply:

def time_delta(row):
    if row['a_time'] > row['b_time']:
        return (row['a_time'] - row['b_time']) / np.timedelta64(1, 'm')
    else:
        return (row['b_time'] - row['a_time']) / np.timedelta64(1, 'm')

df['C'] = df.apply(time_delta, axis=1)

Notice, in both versions, we assume you are starting with datetime series. If this isn't case, make sure you convert to datetime as an initial step:

time_cols = ['a_time', 'b_time']
df[time_cols] = df[time_cols].apply(pd.to_datetime)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.