2

Here is a page with a lot of stuff on it but it has 50 blocks of the blocks I have posted below.

HTML Block

<li>
    <dl>
        <dd>

        <a href="/wow/en/item/113987" class="color-q4" data-item="pl=100&amp;cc=5&amp;bl=566">




    <span  class="icon-frame frame-18 " style='background-image: url("http://media.blizzard.com/wow/icons/18/inv_misc_trinket6oih_lanternb1.jpg");'>
    </span>
</a>Obtained <a href="/wow/en/item/113987" class="color-q4" data-item="pl=100&amp;cc=5&amp;bl=566">Battering Talisman</a>.


</dd>
        <dt>22 hours ago</dt>
    </dl>
    </li>

The code I'm using now only searches for this line

Obtained <a href="/wow/en/item/113987" class="color-q4" data-item="pl=100&amp;cc=5&amp;bl=566">Battering Talisman</a>.

How can I get my MatchCollection to return the full HTML block as 1 match?

Dim explorer As New WowExplorer(WowDotNetAPI.Region.EU, Locale.en_GB, "apikey")
    Dim Request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("http://eu.battle.net/wow/en/character/" & Me.Realm & "/" & Me.Name & "/feed")
    Dim Response As System.Net.HttpWebResponse = Request.GetResponse
    Dim sr As System.IO.StreamReader = New System.IO.StreamReader(Response.GetResponseStream())
    Dim Sourecode As String = sr.ReadToEnd

    Dim Item_ As New System.Text.RegularExpressions.Regex( _
    "Obtained <a href=""/wow/en/item/.*"" class=""color-q4"".*")

    Dim matche_name As MatchCollection = Item_.Matches(Sourecode)
    For Each Match As Match In matche_name
        Dim ItemID As String
        Dim ID_Match As String = Match.Value.Split("/").GetValue(4)
        ItemID = ID_Match.Split("""").GetValue(0)
        Me.Items.Add(explorer.GetItem(ItemID, ItemSource))
    Next
3
  • 3
    No, Regex is way too wrong to parse HTML Commented Apr 16, 2015 at 20:07
  • Can i ask how you would do it then it doesn't have to be regex just googling brought me to regex. i am most definatly not going to argu the point but i was convinced regex would do it :( Commented Apr 16, 2015 at 20:59
  • HTMLAgilityPack or other specific, dedicated HTML parser tools Commented Apr 16, 2015 at 21:02

1 Answer 1

1

Here is a sample code showing how to get those strings using XDocument and Xpath and regex (I added a second <li> to emulate HTML you might have):

Dim dds As List(Of String), dts As List(Of String)
dds = New List(Of String)
dts = New List(Of String)
Dim str As String = "<li> <dl>         <dd>            <a href=""/wow/en/item/113987"" class=""color-q4"" data-item=""pl=100&amp;cc=5&amp;bl=566"">                <span class=""icon-frame frame-18 "" style='background-image: url(""http://media.blizzard.com/wow/icons/18/inv_misc_trinket6oih_lanternb1.jpg"");'>                </span>            </a>Obtained <a href=""/wow/en/item/113987"" class=""color-q4"" data-item=""pl=100&amp;cc=5&amp;bl=566"">Battering Talisman</a>.</dd>       <dt>22 hours ago</dt>    </dl>    </li>"
str += "<li> <dl>         <dd>            <a href=""/wow/en/item/113987"" class=""color-q4"" data-item=""pl=100&amp;cc=5&amp;bl=566"">                <span class=""icon-frame frame-18 "" style='background-image: url(""http://media.blizzard.com/wow/icons/18/inv_misc_trinket6oih_lanternb1.jpg"");'>                </span>            </a>Obtained <a href=""/wow/en/item/113987"" class=""color-q4"" data-item=""pl=100&amp;cc=5&amp;bl=566"">New Talisman</a>.</dd>       <dt>10 hours ago</dt>    </dl>    </li>"
' XPATH WAY
Dim xDoc As XDocument = XDocument.Parse("<?xml version= '1.0'?><root>" + str + "</root>")
dds = xDoc.XPathSelectElements("//dd").Select(Function(m) m.Value).ToList()
dts = xDoc.XPathSelectElements("//dt").Select(Function(m) m.Value).ToList()

' REGEX WAY
dds = New List(Of String)
dts = New List(Of String)
Dim rx As Regex = New Regex("(?s)</a>([^<]*?)<a\s[^>]*?>([^<]*?)</a>([^<\r\n]*)")
Dim matches As IEnumerable(Of Match) = rx.Matches(str).Cast(Of Match)().Select(Function(m) m)
dds = (From match In matches
       Select match.Groups(1).Value + match.Groups(2).Value + match.Groups(3).Value).ToList()
Dim rxDt As Regex = New Regex("(?s)<dt>\s*([^<]*?)\s*</dt>")
Dim matches_dts As IEnumerable(Of Match) = rxDt.Matches(str).Cast(Of Match)().Select(Function(m) m)
dts = (From match In matches_dts
       Select match.Groups(1).Value).ToList()

Results:

enter image description here

Sign up to request clarification or add additional context in comments.

6 Comments

Hi thanks for commenting how did you manage to get that all on one line and it accept that i thought for regex it had to be in the exact format. the VB code i have already gets all the info APART from i need to be able to get that <dt> 1 day ago</dt> so i can get a data from when the item was recieved. I dont mid using the longer way of splits to get the info i need plus atleast i understand what it is im doing but is their away to get that whole blocki as a match then i can split contains ect ect to get the info i need
The regex to get the <dt> value is (?s)<dt>\s*([^<]*?)\s*</dt>.
Well i have kinda just understood your Xdoc and Xpath code and think i can edit this to do exactly what i need :D I need to go to sleep now but i will let you know if i can get it to work, i already tried to upvote your answer but i need more rep :(
ahh wait no sorry i dont know if I can do as i thought, again their is 50 blocks of this code and not all are needed but il still have ago
Hi mate, so when using Xpath how can i get it to capture im assuming Dim q = xdoc.XPathSelectElements("//li") but how can i loop through them is their a way to get that html i posted above as a string value
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.