1

I'm working on a markdown to html parser. I understand this is a big project and there are third party libraries, but none the less I want to roll a simple solution on my own that doesn't have to handle every single aspect of markdown.

So far the process is to take an input (in my case the value of a textarea) and parse it line by line.

var html = [];
var lines = txt.split('\n'); //Convert string to array
//Remove empty lines
for(var index = lines.length-1; index >= 0; index--) {
    if(lines[index] == '') lines.splice(index, 1);
}
//Parse line by line
for(var index = 0; index <= lines.length-1; index++) {
    var str = lines[index];
    if(str.match(/^#[^#]/)) {
        //Header
        str = str.replace(/#(.*?)$/g, '<h1>$1</h1>');
    } else if(str.match(/^##[^#]/)) {
        //Header 2
        str = str.replace(/##(.*?)$/g, '<h2>$1</h2>');
    } else if(str.match(/^###[^#]/)) {
        //Header 3
        str = str.replace(/###(.*?)$/g, '<h3>$1</h3>');
    } else if(str.trim().startsWith('+')) {
        //Unordered List
        var orig = str;
        str = str.replace(/\+(.*?)$/, '<li>$1</li>');

        var previous, next;
        if(index > 0) previous = lines[index-1];
        if(!previous || previous && previous.indexOf('+') < orig.indexOf('+')) {
            str = '<ul>' + str;
        }
        if(index < lines.length-1) next = lines[index+1];
        if(!next || next && next.indexOf('+') < orig.indexOf('+')) {
            var count = Math.max(0, orig.indexOf('+') / 4);
            if(next) count = count - Math.max(0, next.indexOf('+') / 4);
            for(var i=1; i<=count; i++) {
                    str = str + '</ul>';
            }
        }
            if(next && next.trim().indexOf('+') == -1) str = str + '</ul>';
        } else if(str.match(/^[0-9a-zA-Z]/)) {
            //Paragraph
            str = str.replace(/^([0-9a-zA-Z].*?)$/g, '<p>$1</p>');
        }
    //Inline formatting
    str = str.replace(/\*\*(.*?)\*\*/g, '<strong>$1</strong>'); //Bold
    str = str.replace(/\_\_(.*?)\_\_/g, '<strong>$1</strong>'); //Another bold
    str = str.replace(/\*(.*?)\*/g, '<em>$1</em>'); //Italics
    str = str.replace(/\_(.*?)\_/g, '<em>$1</em>'); //Another italics
    //Append formatted to return string
    html.push(str);
}

Where I run into problems is with nested blocks such as ul. Currently the code looks at a line that starts with a + and wraps it in an li. Great, but these list items never get placed within a ul. I could run through the output again after the line by line and just wrap every group of li's, but that screws me up when I have nested li's that require their own ul.

Any thoughts on how to apply these additional wrapper tags? I've considered using my own special characters around list type elements so I know where to add the wrapper tags, but that breaks traditional markdown. I wouldn't be able to pass the raw markdown to someone other than myself and know they'd understand what was going on.

Edit I updated my code sample to include a working sample. The working sample also supports nested lists.

1 Answer 1

1

You need a very simple state machine.

When you encounter the first + you add <ul> and raise a flag.

If you don't see a line that starts with + and your flag is raised, then close the </ul>

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.