How to convert Markdown-style links using regex?

Question

I'm trying to write a regular expression that replaces a markdown-style links but it doesn't seem to be working. This is what I have so far:

# ruby code:
text = "[link me up](http://www.example.com)"
text.gsub!(%r{\[(\+)\]\((\+)\)}x, %{<a target="_blank" href="\\1">\\2</a>})

What am I doing wrong?

Why not use a full Ruby Markdown library, like the wonderful kramdown? — Phrogz
– Phrogz, Commented Feb 13, 2012 at 22:41
because I only need a limited subset of markdown features and haven't found a library that allows me to specify which features I want to support (so I'm having to write my own). — Andrew
– Andrew, Commented Feb 14, 2012 at 0:00

Phrogz · Accepted Answer · 2012-02-13 22:39:05Z

42

irb(main):001:0> text = "[link me up](http://www.example.com)"
irb(main):002:0> text.gsub /\[([^\]]+)\]\(([^)]+)\)/, '<a href="\2">\1</a>'
#=> "<a href=\"http://www.example.com\">link me up</a>"

We can use the extended option for Ruby's regex to make it not look like a cat jumped on the keyboard:

def linkup( str )
  str.gsub %r{
    \[         # Literal opening bracket
      (        # Capture what we find in here
        [^\]]+ # One or more characters other than close bracket
      )        # Stop capturing
    \]         # Literal closing bracket
    \(         # Literal opening parenthesis
      (        # Capture what we find in here
        [^)]+  # One or more characters other than close parenthesis
      )        # Stop capturing
    \)         # Literal closing parenthesis
  }x, '<a href="\2">\1</a>'
end

text = "[link me up](http://www.example.com)"
puts linkup(text)
#=> <a href="http://www.example.com">link me up</a>

Note that the above will fail for URLs that have a right parenthesis in them, e.g.

linkup "[O](http://msdn.microsoft.com/en-us/library/ms533050(v=vs.85).aspx)"
# <a href="http://msdn.microsoft.com/en-us/library/ms533050(v=vs.85">O</a>.aspx)

If this is important to you, you replace the [^)]+ with \S+(?=\)) which means "find as many non-whitespace-characters as you can, but ensure that there is a ) afterwards".

To answer your question "what am I doing wrong", here's what your regex said:

%r{
  \[      # Literal opening bracket   (good)
    (     # Start capturing           (good)
      \+  # A literal plus character  (OOPS)
    )     # Stop capturing            (good)
  \]      # Literal closing bracket   (good)
  \(      # Literal opening paren     (good)
    (     # Start capturing           (good)
      \+  # A literal plus character  (OOPS)
    )     # Stop capturing            (good)
  \)      # Literal closing paren     (good)
}x

edited Feb 13, 2012 at 22:39

answered Feb 13, 2012 at 22:12

Phrogz

304k115 gold badges669 silver badges758 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Andrew Over a year ago

also, I've never seen a URL with parenthesis. I didn't even think this was valid. thanks for pointing that out

Juho Vepsäläinen Over a year ago

Awesome explanation. Would've given more than +1 if possible. I ended up doing a matcher for [[foobar]] kind of syntax using your advice. Here's my go at it (simplified from yours): /\[\[([^\]]+)\]\]/ .

nfvs Over a year ago

To support tooltips, like [link](http://example.com "tooltip"), use this regex: /\[([^\]]+)\]\(([^)"]+)(?: \"([^\"]+)\")?\)/

thatidiotguy Over a year ago

Just wanted to note that based on the common mark spec, opening and closing brackets in a URL are allowed. So a url like http://example.com?query[]=something should be allowed, but the regex provided does not account for that.

Phrogz Over a year ago

@thatidiotguy Incorrect. The regex prevents a closed bracket within the link text, but not within the URL. See rubular.com/r/kG7s9bHlOl for example.

|

Collectives™ on Stack Overflow

How to convert Markdown-style links using regex?

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related