1

I'm trying to parse a formatted string. I need to know how many hours, minutes and seconds every project I retrieve has been worked on.

The data I receive is in this format, example:

PT5H12M3S, this means 5 hours 12 minutes 3 seconds.

However, if there is less than an hour of work, it will just not be displayed:

PT12M3S, this means 12 minutes 3 seconds.

Even more, if there has not been worked on a project (or only for less than a minute) the data will be displayed as so:

PT0S

If a project only has full hours worked on it, it will be displayed as:

PT5H

I tried parsing the data with the following code:

estimated = track_data['project']['estimate']['estimate'].split('PT')[1]
estimated_hours = estimated.split('H')[0]
estimated_minutes = estimated_hours.split('M')[0]
estimated_seconds = estimated_minutes.split('S')[0]

but this solution only works if the data is in the format of PT5H12M3S. All the other formats, this goes wrong. If I, for example, get the data PT5H, then estimated hours will be 5, but also estimated minutes and seconds will be 5 as well. Obviously this is not what we want.

Is there anybody who can give me guidance on where to look? I tried some other things with split but it does not seem to work because if it can't find the 'M' or 'S' it will just keep repeating the same number.

Hope this makes sense and thanks in advance.

2
  • check for empty values in the first place. Commented Feb 13, 2019 at 10:13
  • "If I, for example, get the data PT5H", your question has the answer. check some conditions first and adjust the logic accordingly. Commented Feb 13, 2019 at 10:14

5 Answers 5

3

You can use regular expressions for that:

import re

PROJECT_TIME_REGEX = re.compile(r'PT(?:(\d+)H)?(?:(\d+)M)?(?:(\d+)S)?')

def get_project_time(s):
    m = PROJECT_TIME_REGEX.match(s)
    if not m:
        raise ValueError('invalid string')
    hour, min, sec = (int(g) if g is not None else 0 for g in m.groups())
    return hour, min, sec

print(get_project_time('PT5H12M3S'))
# (5, 12, 3)
print(get_project_time('PT12M3S'))
# (0, 12, 3)
print(get_project_time('PT0S'))
# (0, 0, 0)
print(get_project_time('PT5H'))
# (5, 0, 0)
Sign up to request clarification or add additional context in comments.

3 Comments

would fail for PT5H3S
@user5173426 Not really, it outputs (5, 0, 3).
My bad, overlooked. +1
2

How's this?

import re

def parsept(ptstring):
    regex = re.compile(
            r'PT'
            r'(?:(?P<h>\d+)H)?'
            r'(?:(?P<m>\d+)M)?'
            r'(?:(?P<s>\d+)S)?')
    m = regex.match(ptstring)
    if m:
        return (int(m.group('h')) if m.group('h') else 0, 
            int(m.group('m') if m.group('m') else 0,
            int(m.group('s') if m.group('s') else 0)
    # else
    raise ValueError('{0} does not look like a valid PTxHyMzS string'.format(ptstring))

Comments

1

You can use regular expressions and groups in regular expression to capture hours, minutes and seconds - all of which can be optional.

Something along the lines of: /PT(\d*)H?(\d*)M?(\d*)S?/

The parentheses capture groups. So your capture groups will contain the hours, minutes and seconds (all of which are optional).

But regular expressions are not that readable. I would strongly recommend to try parser combinator libraries like Parsec. Parser combinators are much more readable and maintainable and are a joy to write.

Comments

1

A solution without regex, based on conditionals

def parse_time(str_to_parse):
    str_to_parse = str_to_parse.split('PT')[1]
    time_units = ['H', 'M', 'S'] #this needs to always be in left to right or bigger to smaller order
    estimated_time = {k: 0 for k in time_units} 
    for k in time_units:
        if k in str_to_parse:
            left, right = str_to_parse.split(k)
            estimated_time[k], str_to_parse = int(left), right
    return estimated_time

estimated = "PT12M3S"
final_time = parse_time(estimated)
print(final_time)
{'H': 0, 'M': 12, 'S': 3}

Comments

1

I hope this code makes sense. This is a very simple approach in which you loop over the characters of the string, adding the digits to current and evaluating them once an alphabetic character is reached ('S', 'M', 'H').

estimated = 'PT5H'
clean = estimated.split('PT')[1]
seconds = 0
minutes = 0
hours = 0
current = ''

for char in clean:
    if char.isdigit():
        current += char
    else:
        if char == 'S':
            seconds = int(current)
        elif char == 'M':
            minutes = int(current)
        else:
            hours = int(current)

        current = ''

print(hours, minutes, seconds)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.