0

This regular expression should match phone numbers with or without separators:

phonePattern = re.compile(r'^(\d{3})\D*(\d{3})\D+(\d{4})\D*(\d*)$')

It works well for a phone number like this: 800-555-1212-1234, but still doesn't match if it is: 80055512121234.

Even though I'm using the * to indicate zero or more non-white-space characters.

4
  • 1
    I'll give you a hint: ? means optional. Commented Apr 10, 2016 at 23:47
  • 1
    Unless the exact delimiter positions actually matters, then re.match('\d{5,20}', n.replace('-', '')) will do the trick. Just modify the min/max length. Or, more simply, n.replace('-', '').isdigit(). Commented Apr 11, 2016 at 0:19
  • @JoshSmeaton Always liked that approach, afterwards can just reformat to however you want it and show that. Commented Apr 11, 2016 at 0:36
  • @nsalem Did you solve your problem? If yes and any of the answers here helped you, consider accepting it as an answer, please. It'll help others to find the right answer more easily. Commented Jun 14, 2016 at 18:04

2 Answers 2

4

You have the \D+ (one or more non-digits) in your regexp. Also you don't want to have zero or more delimiters. You want exactly single or no delimiters at all, so:

^(\d{3})\D?(\d{3})\D?(\d{4})\D?(\d*)$

Anyway I would use the - instead of the non-digit (\D) if you don't want to match something like 123a456b7890c:

^(\d{3})-?(\d{3})-?(\d{4})-?(\d*)$

The regular expression in words:

  • ^: beginning of the string
  • (\d{3}): a group of 3 digits
  • -?: none or single dash
  • (\d*): a group of zero or more digits
  • $: end of the string

Also, I can recommend the Case study: Parsing Phone Numbers chapter from the Dive Into Python book for some further reading.

Update: it's a good point made by Josh Smeaton in his comment. Depending on your use case it may be easier to sanitize the string first (i.e. remove the dashes) and then validation is just about checking if all characters in the string are digits and if the length is right. If you're storing those phone numbers somewhere it's better to have them in a common format, not once with and once without dashes.

Sign up to request clarification or add additional context in comments.

1 Comment

I would also recommend - as opposed to \D. Overall quality answer. +1
0

Your second \D is followed by + -- this will match one or more non-digits. Replacing it with * will match your second string, so your regexp would look like:

'^(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$'

However, as erip and Dawid Ferenczy suggested, it's probably a good idea to use '?' instead, which will match up to one character:

'^(\d{3})\D?(\d{3})\D?(\d{4})\D?(\d*)$'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.