Javascript Markdown Parsing

Question

I'm working on a markdown to html parser. I understand this is a big project and there are third party libraries, but none the less I want to roll a simple solution on my own that doesn't have to handle every single aspect of markdown.

So far the process is to take an input (in my case the value of a textarea) and parse it line by line.

var html = [];
var lines = txt.split('\n'); //Convert string to array
//Remove empty lines
for(var index = lines.length-1; index >= 0; index--) {
    if(lines[index] == '') lines.splice(index, 1);
}
//Parse line by line
for(var index = 0; index <= lines.length-1; index++) {
    var str = lines[index];
    if(str.match(/^#[^#]/)) {
        //Header
        str = str.replace(/#(.*?)$/g, '<h1>$1</h1>');
    } else if(str.match(/^##[^#]/)) {
        //Header 2
        str = str.replace(/##(.*?)$/g, '<h2>$1</h2>');
    } else if(str.match(/^###[^#]/)) {
        //Header 3
        str = str.replace(/###(.*?)$/g, '<h3>$1</h3>');
    } else if(str.trim().startsWith('+')) {
        //Unordered List
        var orig = str;
        str = str.replace(/\+(.*?)$/, '<li>$1</li>');

        var previous, next;
        if(index > 0) previous = lines[index-1];
        if(!previous || previous && previous.indexOf('+') < orig.indexOf('+')) {
            str = '<ul>' + str;
        }
        if(index < lines.length-1) next = lines[index+1];
        if(!next || next && next.indexOf('+') < orig.indexOf('+')) {
            var count = Math.max(0, orig.indexOf('+') / 4);
            if(next) count = count - Math.max(0, next.indexOf('+') / 4);
            for(var i=1; i<=count; i++) {
                    str = str + '</ul>';
            }
        }
            if(next && next.trim().indexOf('+') == -1) str = str + '</ul>';
        } else if(str.match(/^[0-9a-zA-Z]/)) {
            //Paragraph
            str = str.replace(/^([0-9a-zA-Z].*?)$/g, '<p>$1</p>');
        }
    //Inline formatting
    str = str.replace(/\*\*(.*?)\*\*/g, '<strong>$1</strong>'); //Bold
    str = str.replace(/\_\_(.*?)\_\_/g, '<strong>$1</strong>'); //Another bold
    str = str.replace(/\*(.*?)\*/g, '<em>$1</em>'); //Italics
    str = str.replace(/\_(.*?)\_/g, '<em>$1</em>'); //Another italics
    //Append formatted to return string
    html.push(str);
}

Where I run into problems is with nested blocks such as ul. Currently the code looks at a line that starts with a + and wraps it in an li. Great, but these list items never get placed within a ul. I could run through the output again after the line by line and just wrap every group of li's, but that screws me up when I have nested li's that require their own ul.

Any thoughts on how to apply these additional wrapper tags? I've considered using my own special characters around list type elements so I know where to add the wrapper tags, but that breaks traditional markdown. I wouldn't be able to pass the raw markdown to someone other than myself and know they'd understand what was going on.

Edit I updated my code sample to include a working sample. The working sample also supports nested lists.

Igal S. · Accepted Answer · 2015-07-19 15:51:53Z

1

You need a very simple state machine.

When you encounter the first + you add <ul> and raise a flag.

If you don't see a line that starts with + and your flag is raised, then close the </ul>

answered Jul 19, 2015 at 15:51

Igal S.

14.8k5 gold badges34 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Javascript Markdown Parsing

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related