0

I'm trying to get a specific div from a textfile filled with div's. I'm using streamreader to get into the file, but I don't know how to get the complete div. After getting the div I'm gonna turn each line into a string, which will be added to a list. The textfile is as follows:

<div id="#SMINLANGUAGE1 ">
English
Hello.
This is a Test
Test 23
</div>
<div id="#SMINLANGUAGE2 ">
Dutch
Hallo.
Dit is een Test
Test 29
</div>
<div id="#SMINLANGUAGE3 ">
Spanish
Hola.
Esto es una Prueba.
Prueba 86
</div>

List for English would be:

 Index 0: English
 Index 1: Hello.
 Index 2: This is a Test
 Index 3: Test23
1

1 Answer 1

1

First you need to install HtmlAgilityPack to parse HTML:

Install-Package HtmlAgilityPack

Then by selecting //div path, we can extract all of the available DIVs form the HTML content:

    var doc = new HtmlDocument
    {
        OptionOutputAsXml = true,
        OptionCheckSyntax = true,
        OptionFixNestedTags = true,
        OptionAutoCloseOnEnd = true,
        OptionDefaultStreamEncoding = Encoding.UTF8
    };
    doc.LoadHtml(htmlContent);

    var results = new List<string[]>();
    foreach (var node in doc.DocumentNode.SelectNodes("//div"))
    {
        var divContent = node.InnerText;
        if (string.IsNullOrWhiteSpace(divContent))
            continue;

        var lines = divContent.Trim().Split(new[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
        results.Add(lines);
    }
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.