0

How do I calculate total of string columns in pandas?

myl=[('2012-11-07 19:16:07', ' 2012-11-07 19:21:07', ' 0h 05m 00s'),
 ('2012-11-13 06:16:07', ' 2012-11-13 06:21:07', ' 0h 05m 00s'),
 ('2012-11-15 09:56:07', ' 2012-11-15 11:41:07', ' 1h 45m 00s'),
 ('2012-11-15 22:26:07', ' 2012-11-16 07:01:07', ' 8h 35m 00s')]

import pandas as pd
df = pd.DataFrame(myl, columns=['from', 'to', 'downtime'])

The above code will return the "downtime" in a single column. How do I take the total of integer values in that column?

In [5]: df
Out[5]:
                  from                    to     downtime
0  2012-11-07 19:16:07   2012-11-07 19:21:07   0h 05m 00s
1  2012-11-13 06:16:07   2012-11-13 06:21:07   0h 05m 00s
2  2012-11-15 09:56:07   2012-11-15 11:41:07   1h 45m 00s
3  2012-11-15 22:26:07   2012-11-16 07:01:07   8h 35m 00s

For e.g. in the above output, expected total of downtime column would be 9h 90m 00s


Update:

And how do I calculate day-wise downtime?

Expected result:

2012-11-07 0h 05m 00s
2012-11-13 0h 05m 00s
2012-11-15 10h 20m 00s

This is working as expected:

df['downtime_t'] = pd.to_timedelta(df['downtime'])

df['year'] = pd.DatetimeIndex(pd.to_datetime(df['from'])).year
df['month'] = pd.DatetimeIndex(pd.to_datetime(df['from'])).month
df['day'] = pd.DatetimeIndex(pd.to_datetime(df['from'])).day

df.groupby(['year', 'month', 'day'])['downtime_t'].sum()

And this is also working for year grouping:

df['from_d2'] = pd.to_datetime(df['from'])
df.groupby(df['from_d2'].map(lambda x:  x.year ))['downtime_t'].sum()

But this does not work:

df.groupby(df['from_d2'].map(lambda x:  x.year, x.month, x.day))['downtime_t'].sum()

Is there any other way to achieve group by total?

4
  • 2
    Do you want exactly that result, or is 10h 30m 00s also good? (or better?) Commented Feb 3, 2015 at 9:15
  • 10h 30m 00s is better and correct ! Commented Feb 4, 2015 at 11:40
  • You should convert first the date column to datetimes, and the downtime columns to timedeltas, then you can just do df.groupby(df['from'].dt.date()).mean() Commented Feb 4, 2015 at 12:54
  • 1
    Sorry, I is df['from'].dt.date without the parantheses (property and not a method) Commented Feb 4, 2015 at 14:31

1 Answer 1

2

You can use pandas' to_timedelta function.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_timedelta.html

pd.to_timedelta(df['downtime']).sum()
Sign up to request clarification or add additional context in comments.

3 Comments

getting an error # ValueError: cannot create timedelta string converter for [0h 05m 00s] # pandas version is '0.14.1'
There were a lot of enhancements in the timedelta handling (with the introduction of Timedelta scalar and TimedeltaIndex) in 0.15. Possibly you will have to update.
Yes. It worked with the version 0.15 Updated question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.