0

I'm running Python 3.6.8. I need to sum values that appear in a log file. The line may contain 1 to 14 {index,value} pairs; a typical line for 8 values is in the code below(variable called 'log_line'). The line format with the '- -' separator is consistent. I have working code, but I'm not sure if this is the most elegant or best way to parse this string; it feels a bit clunky. Any suggestions?

    import re
    
    #verion 1
    log_line = 'Some explanatory text was here:      - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'
    log_line_values = log_line.split('- -')[1]
    values = re.findall(r'{\d+,\s\d+}',log_line_values)
    sum_of_values = 0
    for v in values:
        sum_of_values += int(v.replace('{','').replace('}','').replace(' ','').split(',')[1])
    print(f'1) sum_of_values:{sum_of_values}')

    #verions 2, essentially the same, but more concise (some may say confusing)
    sum_of_values = sum([int(v.replace('{','').replace('}','').replace(' ','').split(',')[1]) for v in re.findall(r'{\d+,\s\d+}',log_line.split('- -')[1])])
    print(f'2) sum_of_values:{sum_of_values}')
2
  • 1
    Without access to the log file in question, it's hard to judge whether the regex is too tight, too relaxed, or just right. Commented Mar 31, 2022 at 11:14
  • 1
    Welcome back to Stack Overflow. As a refresher, please read How to Ask and stackoverflow.com/help/dont-ask. Improving working code is open-ended and subjective, and thus off-topic here. It also doesn't fit the site purpose as a searchable repository of questions - since what are the odds someone else will have had the same task and independently written the code the same way? You may be able to get help on codereview.stackexchange.com, after reading their own question guidelines. Commented Mar 31, 2022 at 11:15

3 Answers 3

2

First, no need to get rid of the prefix - the regex will take care of not matching that. Second, we can use capturing groups to capture values that we only care about. In our case, the second value in a comma seperated pair. We can use map(int, iterable) to turn every string to an int in a list, and then we can use sum on that list of numbers.

Putting it all together:

import re

log_line = 'Some explanatory text was here:      - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'
values = re.findall(r'{\d+,\s(\d+)}', log_line_values)
sum_of_values = sum(map(int, values))
Sign up to request clarification or add additional context in comments.

Comments

1

Assuming you've already identified that the line is one that matches the pattern, you can simplify your logic a lot by using a generator expression within sum().

import re

# Compile your regular expression for reuse
# Just pull out the last value from each pair
re_extract_val = re.compile(r'{\d+, (\d+)}')

log_line = 'Some explanatory text was here:      - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'

# Use generator comprehension within sum() to add all values
sum_of_values = sum(int(val) for val in re_extract_val.findall(log_line))

You could also use map(), but I find it's clearer with a generator expression

sum_of_values = sum(map(int, re_extract_val.findall(log_line)))

Comments

0

Ideal use case for regular expressions capture groups:

import re

log_line = 'Some explanatory text was here:      - -{0, 8} {1, 24} {2, 24} {3, 5} {4, 5} {5, 12} {6, 12} {7, 5}'
pattern = r'{(\d+), (\d+)}'

s = sum([int(e[1]) for e in re.findall(pattern, log_line.split('- -')[1])])

print(s) # 95

Here I use re.findall to match numbers from input array and use list comprehension to convert them to numbers and sum.

The advantage of using {(\d+), (\d+)} pattern is the ability to extract first number too (if you need it).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.