0

I want to download a html source, then search for the username and other information, and then display this in my program. I'm pretty new to programming, but a straight noob when it comes to things like this (Regex) so I hope you can explain it to me.

I used Regex before extracting a K/D ratio from a html source, for that I used this code:

string pattern = @"<span class=""kdratio"">\d+\.\d+";

But I have no idea how to start on this one...

This is the line of the source that contains the information:

<section class="profile-header" profile="true" motto="user's motto" user="User" figure="hr-3322-45.hd-190-1.ch-3342-64-66.lg-285-64.sh-3068-82-66.ea-1404-64">

I only need the parts user="User" and figure="x", I couldn't try anything because I really wouldn't know how to start, because the html line looks so different from what I have experience with.

1
  • user="([^"]*?)" figure="([^"]*?)" as regex would work ( i.sstatic.net/i2Nkt.png). But it'd better to use an html parser to extract the values of the attributes user and figure of this section element, the class="profile-header" seems to be a good unique identifier for it. Take a look at stackoverflow.com/questions/846994/how-to-use-html-agility-pack to get to know how to use HTMLAgility Pack to parse the html, find the node (<section>) and extract attributes out of it. Commented Jan 24, 2016 at 1:04

2 Answers 2

3

Regular expressions are not a good idea for matching HTML unless it's very simple, single, tag matching. See here: RegEx match open tags except XHTML self-contained tags

I recommend using an HTML DOM-parsing library and use XPath or CSS selectors to get the information you want. For .NET, HtmlAgilityPack is recommended. For CSS Selectors you'll want Fizzler (an add-on for HtmlAgilityPack).

In JavaScript (easily rewritten to C# and HtmlAgilityPack) it would be this:

document.querySelector(
    "section[class=profile-header][profile=true][user=User]"
).textContent
Sign up to request clarification or add additional context in comments.

1 Comment

Yes that's what i was afraid of... Many people suggest HtmlAgilityPack but it's always been a mystery for me what it is and how to use it, time to find it i guess.
0

Generally for parsing HTML, Regex is not a good choice! HTML tends to be so complicated and it is so hard to write a single Regex to be able to match everything! Instead use a parser like Html Agility Pack.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.