2

I've written a script in vba in combination with regular expressions to parse company name, phone and fax from a webpage. when I run my script I get those information flawlessly. However, the thing is I've used three different expressions and to make them go successfully I created three different regex objects, as in rxp,rxp1, and rxp2.

My question: how can I create one regex object within which I will be able to use three patterns unlike what I've done below?

This is the script (working one):

Sub GetInfo()
    Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
    Dim rxp As New RegExp, rxp1 As New RegExp, rxp2 As New RegExp

    With New XMLHTTP60
        .Open "GET", Url, False
        .send

        rxp.Pattern = "Company Name:(\s[\w\s]+)"
        rxp1.Pattern = "Phone:(\s\+[\d\s]+)"
        rxp2.Pattern = "Fax:(\s\+[\d\s]+)"

        If rxp.Execute(.responseText).Count > 0 Then
            [A1] = rxp.Execute(.responseText).Item(0).SubMatches(0)
        End If

        If rxp1.Execute(.responseText).Count > 0 Then
            [B1] = rxp1.Execute(.responseText).Item(0).SubMatches(0)
        End If

        If rxp2.Execute(.responseText).Count > 0 Then
            [C1] = rxp2.Execute(.responseText).Item(0).SubMatches(0)
         End If
    End With
End Sub

Reference to add to the library to execute the above script:

Microsoft XML, v6.0
Microsoft VBScript Regular Expressions
8
  • Providing some sample data will get you better answers. Without seeing the layout we're just guessing how to combine the patterns. Commented Jul 16, 2018 at 21:23
  • There is already a link provided within the script @emsimpson92. Commented Jul 16, 2018 at 21:24
  • Have you tried OR regex sytax to combine into one pattern string ? Company Name:(\s[\w\s]+)|Phone:(\s\+[\d\s]+)|Fax:(\s\+[\d\s]+) as your pattern ? Commented Jul 16, 2018 at 21:26
  • Thanks for your comment @QHarr. I know how to combine them in a single pattern. What will be the use case? Once again, pattern is not the concern here. How can i use them to get three different results within a single regex object is what my question was. Thanks. Commented Jul 16, 2018 at 21:39
  • They would be in a single regex object. Commented Jul 16, 2018 at 21:46

4 Answers 4

4

You may build a regex with alternatives, enable global matching with rxp.Global = True, and capture the known strings into Group 1 and those unknown parts into Group 2. Then, you will be able to assign the right values to your variables by checking the value of Group 1:

Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
Dim rxp As New RegExp
Dim ms As MatchCollection
Dim m As Match
Dim cname As String, phone As String, fax As String

With New XMLHTTP60
    .Open "GET", Url, False
    .send

    rxp.Pattern = "(Phone|Company Name|Fax):\s*(\+?[\w\s]*\w)"
    rxp.Global = True

    Set ms = rxp.Execute(.responseText)
    For Each m In ms
        If m.SubMatches(0) = "Company Name" Then cname = m.SubMatches(1)
        If m.SubMatches(0) = "Phone" Then phone = m.SubMatches(1)
        If m.SubMatches(0) = "Fax" Then fax = m.SubMatches(1)
    Next

    Debug.Print cname, phone, fax
End With

Output:

Vaucraft Braford Stud       +61 7 4942 4859              +61 7 4942 0618

See the regex demo.

Pattern details:

  • (Phone|Company Name|Fax) - Capturing group 1: any of the three alternatives
  • :\s* - a colon and then 0+ whitespaces
  • (\+?[\w\s]*\w) - Capturing group 2:
    • \+? - an optional +
    • [\w\s]* - 0 or more letters, digits, _ or whitespaces
    • \w - a single letter, digit or _.
Sign up to request clarification or add additional context in comments.

2 Comments

When there is any regex related issues, you are second to none @Wiktor Stribiżew. Thanksssss a trillion. One little question: why the submatches becomes 1 instead of 0? Forgive my ignorance.
@Topto The first capturing group - .SubMatches(0) - holds the known value by which we identify the type of the string we matched. The value we want to know is in Group 2, .SubMatches(1).
0

Company Name:\s*(.*)\n?Phone:\s*(.*)\n?Fax:\s*(.*)\n? will capture it into three capture groups. You can see how it works here.

Group 1 is your company name, group 2 is your phone number, and group 3 is your fax.

Comments

0

You can do it, but I'm not sure if that could be a good idea. Merging the regexp will make it more prone to problems/errors.

If you match all 3 data at the same time, all of them must be present or the regexp will fail. Or even worse, it will fetch wrong data. What happens if the fax is an optional field? See here for examples.

Also, if the template of the web changes, it will be easier to break things. Let's say the template changes and the fax is rendered before the telephone: the whole regexp will fail because searching the 3 data at once means implying some order.

Unless the data you are searching is related or depends within each other, I wouldn't go to that route.

Comments

0

I think the following can help do the same declaring rxp once:

Sub GetInfo()
    Const Url$ = "https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG0120000508&folderid=1736"
    Dim Http As New XMLHTTP60, rxp As New RegExp

    With Http
        .Open "GET", Url, False
        .send
    End With

    With rxp
        .Pattern = "Company Name:(\s[\w\s]+)"
        If .Execute(Http.responseText).Count > 0 Then
            [A1] = .Execute(Http.responseText)(0).SubMatches(0)
        End If

        .Pattern = "Phone:(\s\+[\d\s]+)"
        If .Execute(Http.responseText).Count > 0 Then
            [B1] = .Execute(Http.responseText)(0).SubMatches(0)
        End If

        .Pattern = "Fax:(\s\+[\d\s]+)"
        If .Execute(Http.responseText).Count > 0 Then
            [C1] = .Execute(Http.responseText)(0).SubMatches(0)
        End If
    End With
End Sub

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.