2

I'm attempting to parse phone numbers that can come through in different ways. For example:

(321) 123-4567
(321) 1234567
321-123-4567
321123-4567

I then want to graph each of the three parts separately. My thought is to use named groups and some and or situation like so:

(^\s*(?P<area>[0-9]{3})\-?(?P<fst>[0-9]{3})\-(?P<lst>[0-9]{4}))|(^\s*\(\area\)\s*(\fst)\-?(\lst))

Problem with that, I believe, is that I am not calling the named groups properly. I'm trying to use https://regex101.com/ to help but am still getting stuck. Because the parentheses around the area code should either both be there or neither should be there I don't want to use the "?" character like:

\(?(?P<area>[0-9]{3})\)?

Can anyone Help me with this? Thank you so much.

I'm using python 3.6 and the re package.

1
  • 1
    Regular expressions have certain limitations exactly in the situation you described, i.e. having a balanced pair of brackets. In your case you could use an alternative: (?:\(...\)|...). Commented Feb 10, 2018 at 18:00

2 Answers 2

2

There were a few issues with your regex. You didn't make the brackets optional, and you didn't allow optional spaces between area code and first part. Without seeing your Python code it's not easy to know how you were doing things, but I did this by splitting into a compiled regex, and then using the regex against the list of numbers.

from __future__ import print_function
import re

phone_numbers = [
'(321) 123-4567',
'(321) 1234567',
'321-123-4567',
'321123-4567',
]

regex = re.compile(r'^\s*\(?(?P<area>[0-9]{3})[) -]*(?P<fst>[0-9]{3})-?(?P<sec>[0-9]{4})')

for p in phone_numbers:
    print(regex.sub(r'(\g<area>) \g<fst>-\g<sec>', p))

This isn't perfect as it will allow things that aren't valid syntax (according to your list) to be parsed, but this shouldn't be a problem. For example '(321))- - )) 123-4567' would be parsed correctly.

Sign up to request clarification or add additional context in comments.

Comments

1

I'd use group testing: ^(\()?(?P<area>\d{3})(?(1)\))[ -]?(?P<fst>\d{3})-?(?P<lst>\d{4})$.

In there:

  • (\()? captures an opening parenthese in group 1 when exists.
  • (?(1)\)) tests for existence of a captured group 1, if so matches a closing parenthese.

The rest is pretty straightforward.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.