1

I was wondering if anyone could show me how to extrapolate 'http://www.nbc.com/xyz' and 'I love this show' from the following string in Excel - VB.

Thanks

<a href="http://www.nbc.com/xyz" >I love this show</a><IMG border=0 width=1 height=1 src="http://ad.linksynergy.com/fs-bin/show?id=Loe5O5QVFig&bids=261463.100016851&type=3&subid=0" >

2 Answers 2

4
Sub Tester()
    '### add a reference to "Microsoft HTML Object Library" ###
    Dim odoc As New MSHTML.HTMLDocument
    Dim el As Object
    Dim txt As String

    txt = "<a href=""http://www.nbc.com/xyz"" >I love this show</a>" & _
         "<IMG border=0 width=1 height=1 " & _
         "src=""http://ad.linksynergy.com/fs-bin/show?" & _
         "id=Loe5O5QVFig&bids=261463.100016851&type=3&subid=0"" >"

    odoc.body.innerHTML = txt

    Set el = odoc.getElementsByTagName("a")(0)
    Debug.Print el.innerText
    Debug.Print el.href

End Sub
Sign up to request clarification or add additional context in comments.

5 Comments

Not bad, but I could process at least 1000 of these in the time it takes to load a single innerHTML :)
@ooo - well, I didn't make any claims about performance! I would tend to prefer this method though, since it's a bit more forgiving than the regexp approach. For example, it will handle attributes regardless of whether they're single- or double-quoted.
+1 clever method. @ooo While I'm a regexp fan I am wary of doing so with html.
@TimWilliams - I've just re-read my earlier comment and it comes across as really snooty. It wasn't intended to be!
@ooo - no problem. No offense taken!
0

Once way is using regular expressions. Another way is using Split to split the strings on various delimiters Eg

Option Explicit

Sub splitMethod()
Dim Str As String

    Str = Sheet1.Range("A1").Value
    Debug.Print Split(Str, """")(1)
    Debug.Print Split(Split(Str, ">")(1), "</a")(0)

End Sub

Sub RegexMethod()
Dim Str As String
Dim oRegex As Object
Dim regexArr As Object
Dim rItem As Object

    'Assumes Sheet1.Range("A1").Value holds example string
    Str = Sheet1.Range("A1").Value

    Set oRegex = CreateObject("vbscript.regexp")
    With oRegex
        .Global = True
        .Pattern = "(href=""|>)(.+?)(""|</a>)"
        Set regexArr = .Execute(Str)

        'No lookbehind so replace unwanted chars
        .Pattern = "(href=""|>|""|</a>)"
        For Each rItem In regexArr
            Debug.Print .Replace(rItem, vbNullString)
        Next rItem
    End With
End Sub

'Output:
'http://www.nbc.com/xyz
'I love this show

This matches href=" or > at the start of the string, " or </a> at the end of the string with any character (except \n newline) in between (.+?)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.