6

I have very simple text with HTML (only <b> tag) e.g.

Lorem Ipsum is <b>simply dummy</b> text of the printing and <b>typesetting industry</b>

I would like to split the text to array like this:

[0] - Lorem Ipsum is 
[1] - <b>simply dummy</b>
[2] - text of the printing and
[3] - <b>typesetting industry</b>

The text inside HTML tag must be separated from another text. Is there any simple solution for it?

Thank you

5
  • have you tried something using Split() function or using regular expressions? Commented May 26, 2015 at 10:08
  • ^ for your knowledge. Html cannot be parsed correctly with regex stackoverflow.com/questions/1732348/… Commented May 26, 2015 at 10:15
  • But in this case you can instantiate an array and add [0] - Lorem Ipsum is till you find <b>. When you find <b>, you search for the next </b> and you place the between text in the array [1] - <b>simply dummy</b> and so on. Like a minimal parsing algorithm. This will work if you don't have nested <b>'s. Commented May 26, 2015 at 10:18
  • there's a lib called htmlagility pack that does this for you if you're allowed to use 3rd party libs. Commented May 26, 2015 at 10:24
  • @kubakista It will: stackoverflow.com/questions/29139320/… Commented May 26, 2015 at 11:45

2 Answers 2

5

You may achieve this using following code

string value = @"Lorem Ipsum is <b>simply dummy</b> text of the printing and <b>typesetting industry</b>";
var parts = Regex.Split(value, @"(<b>[\s\S]+?<\/b>)").Where(l => l != string.Empty).ToArray();
Sign up to request clarification or add additional context in comments.

Comments

1

I just wrote this, tested it and it works. It's a bit ugly but it works hahah

    public string[] getHtmlSplitted(String text)
    {
        var list = new List<string>();
        var pattern = "(<b>|</b>)";
        var isInTag = false;            
        var inTagValue = String.Empty;

        foreach (var subStr in Regex.Split(text, pattern))
        {
            if (subStr.Equals("<b>"))
            {
                isInTag = true;
                continue;
            }
            else if (subStr.Equals("</b>"))
            {
                isInTag = false;
                list.Add(String.Format("<b>{0}</b>", inTagValue));
                continue;
            }

            if (isInTag)
            {
                inTagValue = subStr;
                continue;
            }

            list.Add(subStr);

        }
        return list.ToArray();
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.