2

Trying to extract valid phone numbers from a string with no visible delimiters between different data types. In effect, the data around the potential phone number is random and irrelevant.


What should be matched. I am trying to match either one of the following:

  • [random garbage][optional '1'][optional '(']###[optional')'[[random or no white space]###-####[random garbage] or
  • [random garbage]###[optional '-']###,[optional '-']####[random garbage]

Only the 1st phone number is to be used, so I used Global = False in the code below. I could make it even more robust, but I've examined the data and this should be enough.


Working pattern.Here's a code snippet from a function (it returns the matching phone number) that contains the pattern that worked.

With regex
    .Global = False
    .Pattern = "1?(\(?\d{3}\)?\(s+)?|\d{3}-?)\d{3}-?\d{4}"
'this works but does detect an extension as explained above
End With


What should not be matched. I realized that I also need to search for an extension next to the phone number (i.e. [phone number][white space]x#) and if that exists, to treat the phone number as invalid (.test should evaluate to false).


Failed attemps. They all failed (even valid phone numbers had .test evaluate to false):

.Pattern = "1?(\(?\d{3}\)?\(s+)?|\d{3}-?)\d{3}-?\d{4}^(\s?x\d)"
'detect not([optional single white space]x#), added "^(\s?x\d)"
'or
.Pattern = "1?(\(?\d{3}\)?\(s+)?|\d{3}-?)\d{3}-?\d{4}^((\s+?)[x]\d)"
'detect not([optional multiple white space]x#), added "^((\s+?)[x]\d)"

Not sure how to combine positive match tests and negative (not) match tests in the same pattern.


Work-arounds I've tried. When I couldn't get it to work, I tried the following Like patterns (using VBA 'Like', prior to calling the function that utilized Regexp) and that also failed (all evaluated to false even with test strings that contained examples such as "...1x2" or "5 x33" or "...7 x444"; with patterns like "*#x#*", "*#{ x}#*", ""*#{ x}#*".

Here is the code snippet to test the Like function:

If Not (OrigNum Like "*#x#" Or OrigNum Like "*#[ x}#" Or OrigNum Like "*#[  x]#*") Then
    Debug.Print "No #x# in string"
    OrigNum = ExtractPhoneNumber(OrigNum)
Else
    Debug.Print "#x# in string"
End If

Every string evaluated ended up causing "No x# in string" to be displayed (evaluated to false), even when the string contained the above examples, which should have evaluated to true and "#x# in string" being displayed.

Dazed and confused for so long it's...OK, enough of the Led Zepp reference :-)

1 Answer 1

1

Phone number:

  • [optional '1'][optional '(']###[optional')'[[random or no white space]###-####
  • ###[optional '-']###[optional '-']####

*I removed a comma I assumed as a typo, and also assuming the leading 1 is optional for both cases from what I read from your patterns.

Don't match:

  • [phone number][white space]x#


What you're looking for is negative lookaheads.

(?! subexpression ) asserts for that subexpression from the current position and, if the subexpression matches, the match attempt fails (i.e. not followed by).
E.g. (?!\s*x\d) fails when the current position is followed by optional whitespace, an "x" and a digit.


Regex:

1?(?:\(\d{3}\)|\d{3}-?)\s*\d{3}-?\d{4}(?!\s*x\d)

Code:

Public Function getPhoneNumber(strInput As String) As Variant
    Dim regex As New RegExp
    Dim matches As Object

    regex.Pattern = "1?(?:\(\d{3}\)\s*|\d{3}-?)\d{3}-?\d{4}(?!\s*x\d)"
    regex.Global = False

    Set matches = regex.Execute(strInput)
    If matches.Count = 0 Then
        getPhoneNumber = CVErr(xlErrNA)
    Else
        getPhoneNumber = matches(0).Value
    End If
End Function

Results (🎵As it was, then again it will be; though the course may change sometimes🎵):

Screenshot from Excel showing the input un column A and the result of the function in column B

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. That worked and I did have that typo in the description. One thing I still don't understand is the ":" in the match pattern. It seems that it would act as a literal match character ":". I'm unable to locate info that describes what special function the : has in a regular expression match pattern.
It's a non capturing group. (?:a|b) is the same as (a|b) except it doesn't create a backreference.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.