2

I have a free form time duration strings containing hour and minute values one of then could be optional

1 hour
12 hours 3 mins
47 mins
10 hours
1 min

I have to convert them to number of minutes. First searched for Python library which converts times and duration but the string format does not allow to use such approach

Then I tried with regex to extract the number groups:

re.search("(\d+)?.*(\d+\w)", string).group(1)
re.search("(\d+)?.*(\d+\w)", string).group(2)

which worked for most cases when hour and minute values are present or when only the minute value is present (since I made the first group optional) This regex fails when the hour is single digit (1 hour). Also because I am extracting only digits groups without the descriptive text (hour(s) and/or min(s) the calculation is wrong when there is only the hour value (with two digits) - like 10 hours and it wrongly is extracted as the 2nd group as minutes.

1
  • 1
    You can extract time data from a string using datetime.strptime() from python library datetime Commented Oct 24, 2018 at 11:14

4 Answers 4

1

You can use re.findall with the following regex:

import re
s = '''1 hour
12 hours 3 mins
47 mins
10 hours
1 min'''
for h, m in re.findall(r'(?=\d+ *hours?| *\d+ *min(?:ute)?s?)(?:(\d+) *hours?)?(?: *(\d+) *min(?:ute)?s?\b)?', s, flags=re.IGNORECASE):
    print(int(h or 0) * 60 + int(m or 0))

This outputs:

60
723
47
600
1
Sign up to request clarification or add additional context in comments.

Comments

1

I wrote this simple snippet that parses all your cases. Ask if you had any problem.

Output:

1 hour -> 1:00:00
12 hours 3 mins -> 12:03:00
47 mins -> 0:47:00
10 hours -> 10:00:00
1 min -> 0:01:00
random text -> 0:00:00

Code:

import re
from datetime import timedelta


number_word_regex = re.compile(r'(\d+) (\w+)')


def parse_fuzzy_duration(s):
    ret = timedelta(0)

    for number, word in number_word_regex.findall(s):
        number = int(number)

        if word in ['minute', 'min', 'minutes', 'mins']:
            ret += timedelta(minutes=number)
        elif word in ['hour', 'hours']:
            ret += timedelta(hours=number)

    return ret


for s in ['1 hour', '12 hours 3 mins', '47 mins', '10 hours', '1 min', 'random text']:
    print(s, '->', parse_fuzzy_duration(s))

Comments

1

You can try using dateutil and Regex

Demo:

import dateutil.parser as dparser
import re

s = """1 hour
12 hours 3 mins
47 mins
10 hours
1 min"""

for line in s.splitlines():
    print(dparser.parse(re.sub(r"(mins?)", "minutes", line), fuzzy=True).strftime("%H:%M:%S") )

Output:

01:00:00
12:03:00
00:47:00
10:00:00
00:01:00

Comments

0

Other answers is fine, another way to do this with regex (if you really want to):

match = re.match(
    r'((?P<hours>\d+) hours?)? ?((?P<mins>\d+) mins?)?',
    '12 hours 3 mins'
)

match.groupdicts()

This way it may make more sense you you (this is the main problem with a lot of regex engineering. I'd suggest to try out any regex you choose at some resource like https://regex101.com/ to have some testing and description.

1 Comment

blhsing's answer was more fool proof engineered so even if I put arbitrary number of white-spaces between the number and the hour/min pattern it selects the digits. However your's was shorter and thus easier to understand so I was able to change it to the same behavior : ((?P<hours>\d+) *hours?)? ?((?P<mins>\d+) *mins?)? - thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.