Python Regex: Mixed format string duration to seconds

Question

I have a bunch of time durations in a list as follows

['23m3s', '23:34', '53min 3sec', '2h 3m', '22.10', '1:23:33', ...]

A you can guess, there are N permutations of time formatting being used.

What is the most efficient or simplest way to extract duration in seconds from each element in Python?

:-O But they are totaly random? I mean, for example, 23:34 what is? 23h and 34min? Or 1:23:33? Is like 1day 23hour 33min, or 1h 23min 33sec? — maurelio79
– maurelio79, Commented Jan 6, 2014 at 1:26
you will have to write the strptime format for each one and parse them in a loop. — roippi
– roippi, Commented Jan 6, 2014 at 1:34
@maurelio79 23:34 is 23m 34s and 1.23.33 is 1h 23m 33s. Let's assume this is the case always. — blue_zinc
– blue_zinc, Commented Jan 6, 2014 at 8:37

Justin O Barber · Accepted Answer · 2014-01-06 13:10:02Z

2

This is perhaps still a bit crude, but it seems to do the trick for all the data you've posted so far. The second totals all come to what I would expect. A combination of re and timedelta seems to do the trick for this small sample.

>>> import re
>>> from datetime import timedelta

First a dictionary of regexes: UPDATED BASED ON YOUR COMMENT

d = {'hours': [re.compile(r'(\d+)(?=h)'), re.compile(r'^(\d+)[:.]\d+[:.]\d+')],
     'minutes': [re.compile(r'(\d+)(?=m)'), re.compile(r'^(\d+)[:.]\d+$'),
     re.compile(r'^\d+[.:](\d+)[.:]\d+')], 'seconds': [re.compile(r'(\d+)(?=s)'),
     re.compile(r'^\d+[.:]\d+[.:](\d+)'), re.compile(r'^\d+[:.](\d+)$')]}

Then a function to try out the regexes (perhaps still a bit crude):

>>> def convert_to_seconds(*time_str):
    timedeltas = []
    for t in time_str:
        td = timedelta(0)
        for key in d:
            for regex in d[key]:
                if regex.search(t):
                    if key == 'hours':
                        td += timedelta(hours=int(regex.search(t).group(1)))
                    elif key == 'minutes':
                        td += timedelta(seconds=int(regex.search(t).group(1)) * 60)
                    elif key == 'seconds':
                        td += timedelta(seconds=int(regex.search(t).group(1)))
        print(td.seconds)

Here are the results:

>>> convert_to_seconds(*t)
1383
1414
3183
7380
1330
5013

You could add more regexes as you encounter more data, but only to an extent.

edited Jan 6, 2014 at 13:10

answered Jan 6, 2014 at 1:50

Justin O Barber

11.6k2 gold badges43 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

blue_zinc Over a year ago

This is good stuff. And I did explore down this path, however I had to keep adding to the regex dictionary. I'll accept this unless I stumble upon a more elegant solution by the time I actually have to use it... Thanks

Toto Over a year ago

23:34 is 23 minutes and 34 seconds, not 23 hours and 34 minutes. Same for 22.10.

Justin O Barber Over a year ago

@m42 Thanks for pointing that out. I missed the OP's comment to this effect. I have updated the regexes and posted the new results.

Collectives™ on Stack Overflow

Python Regex: Mixed format string duration to seconds

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related