Find indexes in String using multiple search items and one single iteration

Question

I have the following HTML sample document:

.....
<div class="TableElement">
    <table>
    <tr>
        <th class="boxToolTip" title="La quotazione di A2A è in rialzo o in ribasso?">&nbsp;</th>
        ..
        <th class="boxToolTip" class="ColumnLast" title="Trades più recenti su A2A">Ora <img title='' alt='' class='quotePageRTupgradeLink' href='#quotePageRTupgradeContainer' id='cautionImageEnt' src='/common/images/icons/caution_sign.gif'/></th>
    </tr>
    <tr class="odd">
        ..
        <td align="center"><span id="quoteElementPiece6" class="PriceTextUp">1,619</span></td>
        <td align="center"><span id="quoteElementPiece7" class="">1,6235</span></td>
        <td align="center"><span id="quoteElementPiece8" class="">1,591</span></td>
        <td align="center"><span id="quoteElementPiece9" class="">1,5995</span></td>
        ..
    </tr>
    </table>
</div>
......

I need to get the values corresponding at quoteElementPiece 6,7,8,9 and 17 (currently further in the document) section.

I am simply searching one by one in the code at the moment:

int index6 = doc.IndexOf("quoteElementPiece6");
..
int index17 = doc.IndexOf("quoteElementPiece17");

I want to improve this by scanning in one go and having all the indexes for the substrings I need. Example:

var searchstrings = new string[]
{
    "quoteElementPiece6",
    "quoteElementPiece7",
    "quoteElementPiece8",
    "quoteElementPiece9",
    "quoteElementPiece17"
};

int[] indexes = getIndexes(document,searchstrings); //indexes should be sorted accordingly to the order in searchstrings

Is there anything native in .NET doing this (LinQ for istance)?

I know there are HTML Parser libraries but I prefer avoiding using those, I would like to learn how to do this for each kind of document.

Please show at least what you have googled?.... A simpler search gives this: C# Is there a LINQ to HTML, or some other good .Net HTML manipulation API? — Gilad Green
– Gilad Green, Commented Dec 16, 2018 at 19:12
I want to avoid using third party libraries or to parse the whole HTML, the document being an HTML is just an example. — farbiondriven
– farbiondriven, Commented Dec 16, 2018 at 19:14
You can use Linq To XML, but that requires a well formed HTML which you can get using SgmlReader. You might also use HtmlAgilityPack. — Cetin Basoz
– Cetin Basoz, Commented Dec 16, 2018 at 19:15

LosManos · Accepted Answer · 2018-12-16 19:28:39Z

2

var words = new []{
    "quoteElementPiece6",
    "quoteElementPiece7"};      
// I take for granted your `document` is a string and not an `HtmlDocument` or whatnot.
var result = words.Select(word=>document.IndexOf(word));
Console.WriteLine(string.Join(",", result));

answered Dec 16, 2018 at 19:28

LosManos

7,8126 gold badges62 silver badges120 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

farbiondriven Over a year ago

This works and it's more elegant. But it doesn't perform the scan in one go.

LosManos Over a year ago

Then I believe you should look into RegEx but then you have two problems.

LosManos · Accepted Answer · 2018-12-16 19:29:35Z

0

you can do this with LINQ. check my solution

var doc = "this is my document";
List<string> searchstrings = new List<string>
{
    "quoteElementPiece6",
    "quoteElementPiece7",
    "quoteElementPiece8",
    "quoteElementPiece9",
    "quoteElementPiece17"
};
var lastIndexOfList = new List<int>(searchstrings.Count);

searchstrings.ForEach(x => lastIndexOfList.Add(doc.LastIndexOf(x)));

edited Dec 16, 2018 at 19:29

LosManos

7,8126 gold badges62 silver badges120 bronze badges

answered Dec 16, 2018 at 19:28

Derviş Kayımbaşıoğlu

30.9k4 gold badges55 silver badges79 bronze badges

Comments

JohnyL · Accepted Answer · 2018-12-17 05:18:11Z

0

var pattern = @"(?s)<tr class=""odd"">.+?</tr>";
var tr = Regex.Match(html, pattern).Value.Replace("&nbsp;", "");
var xml = XElement.Parse(tr);
var nums = xml
            .Descendants()
            .Where(n => (string)n.Attribute("id") != null)
            .Where(n => n.Attribute("id").Value.StartsWith("quoteElementPiece"))
            .Select(n => Regex.Match(n.Attribute("id").Value, "[0-9]+").Value);

edited Dec 17, 2018 at 5:18

answered Dec 16, 2018 at 19:43

JohnyL

7,1845 gold badges27 silver badges48 bronze badges

1 Comment

Antonín Lejsek Over a year ago

it seems like you are looking for matching tags, but because the +is greedy, it does not have to be the case.

Collectives™ on Stack Overflow

Find indexes in String using multiple search items and one single iteration

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related