1

I have the following HTML sample document:

.....
<div class="TableElement">
    <table>
    <tr>
        <th class="boxToolTip" title="La quotazione di A2A è in rialzo o in ribasso?">&nbsp;</th>
        ..
        <th class="boxToolTip" class="ColumnLast" title="Trades più recenti su A2A">Ora <img title='' alt='' class='quotePageRTupgradeLink' href='#quotePageRTupgradeContainer' id='cautionImageEnt' src='/common/images/icons/caution_sign.gif'/></th>
    </tr>
    <tr class="odd">
        ..
        <td align="center"><span id="quoteElementPiece6" class="PriceTextUp">1,619</span></td>
        <td align="center"><span id="quoteElementPiece7" class="">1,6235</span></td>
        <td align="center"><span id="quoteElementPiece8" class="">1,591</span></td>
        <td align="center"><span id="quoteElementPiece9" class="">1,5995</span></td>
        ..
    </tr>
    </table>
</div>
......

I need to get the values corresponding at quoteElementPiece 6,7,8,9 and 17 (currently further in the document) section.

I am simply searching one by one in the code at the moment:

int index6 = doc.IndexOf("quoteElementPiece6");
..
int index17 = doc.IndexOf("quoteElementPiece17");

I want to improve this by scanning in one go and having all the indexes for the substrings I need. Example:

var searchstrings = new string[]
{
    "quoteElementPiece6",
    "quoteElementPiece7",
    "quoteElementPiece8",
    "quoteElementPiece9",
    "quoteElementPiece17"
};

int[] indexes = getIndexes(document,searchstrings); //indexes should be sorted accordingly to the order in searchstrings

Is there anything native in .NET doing this (LinQ for istance)?

I know there are HTML Parser libraries but I prefer avoiding using those, I would like to learn how to do this for each kind of document.

3
  • 1
    Please show at least what you have googled?.... A simpler search gives this: C# Is there a LINQ to HTML, or some other good .Net HTML manipulation API? Commented Dec 16, 2018 at 19:12
  • I want to avoid using third party libraries or to parse the whole HTML, the document being an HTML is just an example. Commented Dec 16, 2018 at 19:14
  • 1
    You can use Linq To XML, but that requires a well formed HTML which you can get using SgmlReader. You might also use HtmlAgilityPack. Commented Dec 16, 2018 at 19:15

3 Answers 3

2
var words = new []{
    "quoteElementPiece6",
    "quoteElementPiece7"};      
// I take for granted your `document` is a string and not an `HtmlDocument` or whatnot.
var result = words.Select(word=>document.IndexOf(word));
Console.WriteLine(string.Join(",", result));
Sign up to request clarification or add additional context in comments.

2 Comments

This works and it's more elegant. But it doesn't perform the scan in one go.
Then I believe you should look into RegEx but then you have two problems.
0

you can do this with LINQ. check my solution

var doc = "this is my document";
List<string> searchstrings = new List<string>
{
    "quoteElementPiece6",
    "quoteElementPiece7",
    "quoteElementPiece8",
    "quoteElementPiece9",
    "quoteElementPiece17"
};
var lastIndexOfList = new List<int>(searchstrings.Count);

searchstrings.ForEach(x => lastIndexOfList.Add(doc.LastIndexOf(x)));

Comments

0
var pattern = @"(?s)<tr class=""odd"">.+?</tr>";
var tr = Regex.Match(html, pattern).Value.Replace("&nbsp;", "");
var xml = XElement.Parse(tr);
var nums = xml
            .Descendants()
            .Where(n => (string)n.Attribute("id") != null)
            .Where(n => n.Attribute("id").Value.StartsWith("quoteElementPiece"))
            .Select(n => Regex.Match(n.Attribute("id").Value, "[0-9]+").Value);

1 Comment

it seems like you are looking for matching tags, but because the +is greedy, it does not have to be the case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.