1

I want to take a Twitter text like this:

s = "Today 09/07 sunday http://t.co/123 - AC/DC COVER Opening and DVD - woman R$10 / man R$15. - http://migre.me/59qwc"

and turn it into this..

s = "Today 09/07 sunday LINK - AC/DC COVER Opening and DVD - woman R$10 / man R$15. - LINK"

This snippet is failing for some reason, please, some help

s.replace(/(http\:.*)\s/g , 'LINK')
4
  • I would assume its not replacing anything... Commented Jul 10, 2011 at 0:49
  • I would expect that it replaces everything from "sunday " to "migre.me/59qwc" with LINK :) Commented Jul 10, 2011 at 0:53
  • 1
    It's not matching http://migre.me/59qwc because there is no space after it. Commented Jul 10, 2011 at 0:55
  • @mike: Yeah, the stackoverflow interpreter ate my http bit... Commented Jul 14, 2011 at 1:27

5 Answers 5

3

Try using

/\bhttps?\:\S*/ig

which uses \S* to match runs of non-space characters so won't have problems matching at the end of input where there is no following space.

Sign up to request clarification or add additional context in comments.

Comments

1

try:

input.replace(/http:\/{2}[^\s]+/,"link")

Comments

0

.* will eat all, including whitespace, so this finds everything, until it cannot go further, then it backtracks to find the single whitespace character. You'll have to match only non-whitespace characters for the URL and you will be done.

3 Comments

Don't forget that the URL does not have to end with a whitespace though, in Java I would have used reluctant quantifiers to achieve this, including a end of input ($), but the JavaScript language seems to be less capable.
@owistead, $ works just fine in JavaScript. In general though, you're right. The JavaScript regular expression language is missing a few things that java.util.regex has including lookbehind and unicode character classes.
@mike: I wasn't saying that $ was missing from JavaScript (that would have been weird), but that I would have included $ in the part with the reluctant quantifiers - and those are missing. Of course, that does not withhold smart guys like you to give an answer without them :)
0

As stated, .* will match whitespace and thus replace everything. Depending on the system you are using, you may be able to get away with something like \S*, which matches only non-whitespace characters, or else a more explicit [^ ]* instead.

Comments

0

This should strip HTML from your text

s.replace(/<.*?>/g, '');

1 Comment

That's a nice regex for removing tags, but the original question was asking about removing urls that begin with "http://" or "https://".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.