0

I've got a question about regular expressions in Python. I'm working on a project for which I have to parse a bunch of huge text files and extract certain parts of them into a spreadsheet. One part of it is a bunch of lot sizes, expressed in the form "NUMBERxNUMBERxNUMBER...". They're stored in the middle of a much bigger line, like this:

Spring st. , No. 208, 18.9x42.2x49x10x8x100. 'John S. Giles, exr. to Herman Goodstein, W. D. . 11,100

I'm trying to design a regular expression that would yield:

18.9x42.2x49x10x8x100

But I'm not quite sure where to start. What would be the best way to design an expression of this type, where there can be any number of numbers (with decimal points), separated by x? Whitespace would stop the analysis. Thank you in advance for the help, I really appreciate it!

1 Answer 1

2
>>> import re
>>> s = '''Spring st. , No. 208, 18.9x42.2x49x10x8x100. 'John S. Giles, exr. to Herman Goodstein, W. D. . 11,100'''
>>> re.search('(?:\d+(?:\.\d+)?x)+\d+(?:\.\d+)?', s)
<_sre.SRE_Match object; span=(22, 43), match='18.9x42.2x49x10x8x100'>
>>> _.group(0)
'18.9x42.2x49x10x8x100'

The regular expression consists of \d+(?:\.\d+)? twice which just is a number of digits optionally followed by a dot with more digits. We do this to prevent a trailing dot. The expression searches for this “number part” followed by an x as often as possible and then requires a final “number part”.

Sign up to request clarification or add additional context in comments.

1 Comment

@stribizhev Thanks! Forgot to escape the dot—fixed!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.