1

I have some ASCII documents in the following format:

[section heading]
paragraphs......

[section heading]
paragraphs......
...

Note: heading text are always enclosed in some specific pattern (e.g. [ ] in the above example)

I want to split the file into separate sections (each with a heading and the content).

What would be the most efficient way to parse the above document?

Using Regex.Match() I can extract the headings, but not the subsequent text content.

Using Regex.Split() I can grab the content, but not the related headings.

Is it possible combine these two Regex methods to parse the document? Are there better ways to achieve the same?

2 Answers 2

1
(\[[^\]]*\])\n([\s\S]*?)(?=\n\[|$)

You can try this.Grab the group 1 and group 2.See demo.

https://regex101.com/r/gU4aG0/1

Sign up to request clarification or add additional context in comments.

Comments

1

Try this:

string search = "\[([\w ]+)\]([^\[]*)";
foreach (Match match in Regex.Matches(yourtext, search))
    {
        string heading = match.Groups[1];
        string text = match.Groups[2];
    }

The regular expression capture both the heading and the paragraph. Thanks to capturing groups (between parentheses), you can extract both of them by iterating over the matches.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.