RegExp for stripping URLs from a string

Question

I want to take a Twitter text like this:

s = "Today 09/07 sunday http://t.co/123 - AC/DC COVER Opening and DVD - woman R$10 / man R$15. - http://migre.me/59qwc"

and turn it into this..

s = "Today 09/07 sunday LINK - AC/DC COVER Opening and DVD - woman R$10 / man R$15. - LINK"

This snippet is failing for some reason, please, some help

s.replace(/(http\:.*)\s/g , 'LINK')

I would expect that it replaces everything from "sunday " to "migre.me/59qwc" with LINK :) — Maarten Bodewes
– Maarten Bodewes, Commented Jul 10, 2011 at 0:53
It's not matching http://migre.me/59qwc because there is no space after it. — Mike Samuel
– Mike Samuel, Commented Jul 10, 2011 at 0:55
@mike: Yeah, the stackoverflow interpreter ate my http bit... — Maarten Bodewes
– Maarten Bodewes, Commented Jul 14, 2011 at 1:27

Mike Samuel · Accepted Answer · 2011-07-10 00:56:48Z

3

Try using

/\bhttps?\:\S*/ig

which uses \S* to match runs of non-space characters so won't have problems matching at the end of input where there is no following space.

answered Jul 10, 2011 at 0:56

Mike Samuel

121k30 gold badges230 silver badges255 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

The Mask · Accepted Answer · 2011-07-10 00:57:11Z

1

try:

input.replace(/http:\/{2}[^\s]+/,"link")

answered Jul 10, 2011 at 0:57

The Mask

17.5k38 gold badges117 silver badges189 bronze badges

Comments

Maarten Bodewes · Accepted Answer · 2011-07-10 00:51:00Z

0

.* will eat all, including whitespace, so this finds everything, until it cannot go further, then it backtracks to find the single whitespace character. You'll have to match only non-whitespace characters for the URL and you will be done.

answered Jul 10, 2011 at 0:51

Maarten Bodewes

94.6k15 gold badges169 silver badges289 bronze badges

3 Comments

Maarten Bodewes Over a year ago

Don't forget that the URL does not have to end with a whitespace though, in Java I would have used reluctant quantifiers to achieve this, including a end of input ($), but the JavaScript language seems to be less capable.

Mike Samuel Over a year ago

@owistead, $ works just fine in JavaScript. In general though, you're right. The JavaScript regular expression language is missing a few things that java.util.regex has including lookbehind and unicode character classes.

Maarten Bodewes Over a year ago

@mike: I wasn't saying that $ was missing from JavaScript (that would have been weird), but that I would have included $ in the part with the reluctant quantifiers - and those are missing. Of course, that does not withhold smart guys like you to give an answer without them :)

Whoopska · Accepted Answer · 2011-07-10 00:54:32Z

0

As stated, .* will match whitespace and thus replace everything. Depending on the system you are using, you may be able to get away with something like \S*, which matches only non-whitespace characters, or else a more explicit [^ ]* instead.

answered Jul 10, 2011 at 0:54

Whoopska

1495 bronze badges

Comments

akshayp · Accepted Answer · 2011-07-10 00:54:56Z

0

This should strip HTML from your text

s.replace(/<.*?>/g, '');

answered Jul 10, 2011 at 0:54

akshayp

1

1 Comment

Ray Toal Over a year ago

That's a nice regex for removing tags, but the original question was asking about removing urls that begin with "http://" or "https://".

Collectives™ on Stack Overflow

RegExp for stripping URLs from a string

5 Answers 5

Comments

Comments

3 Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

3 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related