I have the following string:
Bacon ipsum dolor amet **kevin kielbasa** pork chop picanha chuck,
t-bone **brisket corned beef fatback hamburger cow** sirloin shank prosciutto
shankle. T-bone pancetta ribeye **tongue** fatback drumstick frankfurter short
ribs burgdoggen. **Tail cupim.**
I want to obtain:
List<string>(){
"Bacon ipsum dolor amet ",
"**kevin kielbasa**",
" pork chop picanha chuck, t-bone ",
"**brisket corned beef fatback hamburger cow**",
" sirloin shank prosciutto shankle. T-bone pancetta ribeye ",
"**tongue**",
" fatback drumstick frankfurter short ribs burgdoggen. ",
"**Tail cupim.**"
}
Approaches:
- Entirely in Regex:
First Pass
Regex.Split(str, @"\*\*.*?\*\*");
"Bacon ipsum dolor amet ",
" pork chop picanha chuck, t-bone ",
" sirloin shank prosciutto shankle. T-bone pancetta ribeye ",
" fatback drumstick frankfurter short ribs burgdoggen. "
Split removes all of the matching items. It treats each one as a delimiter between the items it thinks we want. D'oh!
Second Pass
Regex.Matches(str, @"\*\*.*?\*\*").Cast<Match>().Select(m => m.Value).ToList();
"**kevin kielbasa**",
"**brisket corned beef fatback hamburger cow**",
"**tongue**",
"**Tail cupim.**"
Well, that makes sense. Regex.Matches() returns all of the items that match the regular expression, so we've lost all of the content between.
- With a dash of LINQ:
Okay, let's see if we can get all of our text in a list together:
Regex.Split(str, @"\*\*");
"Bacon ipsum dolor amet ",
"kevin kielbasa",
" pork chop picanha chuck, t-bone ",
"brisket corned beef fatback hamburger cow",
" sirloin shank prosciutto shankle. T-bone pancetta ribeye ",
"tongue",
" fatback drumstick frankfurter short ribs burgdoggen. ",
"Tail cupim."
Oddly, this simple regex gets us the closest, but we no longer know which items in the list were surrounded by **s. Because the ** alternates every list item, all we need to know is if the first (or second) item in the list is surrounded by **.
bool firstIsMatch = "**" == new string(str.Take(2).ToArray());
And then we can use that bool to determine if we're adding "**" to the beginning and end of every even or odd item in the list.
Questions:
- Is there a way to do this entirely with a regex? If so, how?
- Despite being "more code", is the second option preferred for performance and/or readability?
()for it to be included in the resultsRegex.Split(str, @"(\*\*.*?\*\*)");