2

I'm currently trying to implement my own take on Discord's flavor of markdown into my web application. The way I've done so is simply by chaining replace methods, each checking and replacing the syntax with proper HTML tags (I do sanitize, don't worry).

let description = description.replace(/\`{3}([\S\s]*?)\`{3}/g, '<code>$1</code>')
      .replace(/\`(.*)\`/g, '<code class="inline">$1</code>')
      .replace(/~~([\S\s]*?)~~/g, '<s>$1</s>')

The problem I'm facing is that the regex also matches inside of entire code blocks and also inside inline code. This behavior is not wanted.

**bold and 
*italic and 
__underline and 
~~strikethrough~~__***

`~~Not strikethrough~~`
~~`Strikethrough`~~

Normal text

```
~~Not strikethrough~~
```

~~```
Strikethrough
```~~

**bold and 
*italic and 
__underline and 
~~strikethrough~~__***

`~~Not strikethrough~~`
~~`Strikethrough`~~

Normal text

I've tried something like this: /(?<!`[\S\s])\*([\S\s]*?)\*(?!`)/g but I can't get it to work like expected.

I'm still learning regex and continue to find it hard to wrap my head around, so any and all help is much appreciated.

Jan. 4. 2021 Sorry I didn't clarify earlier but the stylings should be "nestable", or in other words able to be combined e.g. ***strong and italic*** should become strong and italic

I've updated the input text (see above) to better encapsulate all probable use cases.

1
  • I know this is not what you want to hear but doing this only with regex is not really feasible. You could look into a combinator parser or other kinds of parser, like a state machine with lexer. Potentially you want to use an abstract syntax tree. This is not a trivial task. Commented Jan 3, 2021 at 20:48

1 Answer 1

2

You can use

let text = "**bold and \n*italic and \n__underline and \n~~strikethrough~~__***\n\n`~~Not strikethrough~~`\n~~`Strikethrough`~~\n\nNormal text\n\n```\n~~Not strikethrough~~\n```\n\n~~```\nStrikethrough\n```~~\n\n**bold and \n*italic and \n__underline and \n~~strikethrough~~__***\n\n`~~Not strikethrough~~`\n~~`Strikethrough`~~\n\nNormal text";
const re = /<code(?:\s[^>]*)?>[\s\S]*?<\/code>|`{3}([\S\s]*?)`{3}|`([^`]*)`|~~([\S\s]*?)~~|\*{2}([\s\S]*?)\*{2}(?!\*)|\*([^*]*)\*|__([\s\S]*?)__/g;
let tmp="";
do {
  tmp = text;
  text = text.replace(re, (match, a, b, c, d, e, f) => f ? `<u>${f}</u>` : e ?  `<i>${e}</i>` : d ? `<b>${d}</b>` : c ? `<s>${c}</s>` : b ? `<code class="inline">${b}</code>` : a ? `<code>${a}</code>` : match);
}
while (text != tmp);
console.log(text);

See the regex demo.

The point is to devise a single regex for a single pass and capture the string parts into separate groups to apply different replacement logic to.

There are three alternatives matching

  • `{3}([\S\s]*?)`{3} - any substring between triple asterisks capturing it into Group 1 (x)
  • `([^`]*)` - any substring between single asterisks capturing it into Group 2 (y)
  • ~~([\S\s]*?)~~ - any substring between ~~ capturing it into Group 3 (z)

See the regex demo.

Sign up to request clarification or add additional context in comments.

5 Comments

I like this approach, but it doesn't work if I have to combine different styles. Like having underlined italics or bold strikethrough italics. Sorry I didn't clarify this earlier, I've edited the question to be a bit more precise. Thanks though.
@David Check it now.
Amazing @Wiktor, though anything after the code block doens't get styled/matched. I'll try to make it work, but I it's probably too complex for regex.
Use double backticks in comments to format code
Thanks, really, I'm still learning to use StackOverflow. I updated the input text in the question instead.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.