Python Regular Expression--extracting coordinates

Question

I've got a question about regular expressions in Python. I'm working on a project for which I have to parse a bunch of huge text files and extract certain parts of them into a spreadsheet. One part of it is a bunch of lot sizes, expressed in the form "NUMBERxNUMBERxNUMBER...". They're stored in the middle of a much bigger line, like this:

Spring st. , No. 208, 18.9x42.2x49x10x8x100. 'John S. Giles, exr. to Herman Goodstein, W. D. . 11,100

I'm trying to design a regular expression that would yield:

18.9x42.2x49x10x8x100

But I'm not quite sure where to start. What would be the best way to design an expression of this type, where there can be any number of numbers (with decimal points), separated by x? Whitespace would stop the analysis. Thank you in advance for the help, I really appreciate it!

poke · Accepted Answer · 2015-05-21 19:34:04Z

2

>>> import re
>>> s = '''Spring st. , No. 208, 18.9x42.2x49x10x8x100. 'John S. Giles, exr. to Herman Goodstein, W. D. . 11,100'''
>>> re.search('(?:\d+(?:\.\d+)?x)+\d+(?:\.\d+)?', s)
<_sre.SRE_Match object; span=(22, 43), match='18.9x42.2x49x10x8x100'>
>>> _.group(0)
'18.9x42.2x49x10x8x100'

The regular expression consists of \d+(?:\.\d+)? twice which just is a number of digits optionally followed by a dot with more digits. We do this to prevent a trailing dot. The expression searches for this “number part” followed by an x as often as possible and then requires a final “number part”.

edited May 21, 2015 at 19:34

answered May 21, 2015 at 19:30

poke

392k80 gold badges596 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

poke Over a year ago

@stribizhev Thanks! Forgot to escape the dot—fixed!

Collectives™ on Stack Overflow

Python Regular Expression--extracting coordinates

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related