I'm trying to parse a Markdown style list into HTML. I am using several Regular Expressions for this, all according to the JavaScript standard. I know there are several different tools out there to do this, however I thought it would be a good way to practice my RegEx's. I ran into an issue however.
After retrieving a list "block" with both ordered and unordered lists I need to parse the block into different list items. The items have the possibility of being indented, and are therefore spread across multiple lines like so:
1. text
2. text
1. text
2. text
* text
* text
- text
+ text
1. text
* text
1. text
* text
1. text
* text
I have created this RegEx to separate out the different first level list elements and includes the sub-list markdown of the element.
/^(?:\d.|[*+-]) [^]*?(?=^(?:\d.|[*+-]))/gm
Which should achieve these matches...
What I am trying to acheive
1. text
2. text
1. text
2. text
* text
* text
- text
+ text
1. text
* text
1. text
* text
1. text
* text
However, this separates out all list elements except for the last one, as I am using a positive look-ahead to match only list elements that are followed by another list element. Which results in this...
What actually happens when using this RegEx
1. text
2. text
1. text
2. text
* text
* text
- text
+ text
1. text
* text
1. text
As you can see, the last list element is missing.
My thought was to match only list elements that are followed by another list element OR match list elements that are followed by an end of string, like this.
/^(?:\d.|[*+-]) [^]*?(?=^(?:\d.|[*+-])|$)/gm
This doesn't work because I am using the multiline flag. I can't use /Z either since i'm working in JavaScript.
Does anyone know of another way to tackle this problem? Regex101: see this page for the example
$(?![^]). And escape the dot since you want to match a literal dot after a digit