2

I am trying to take a string of text and create an array from it so that the string:

var someText='I am some text and check this out!  http://blah.tld/foo/bar  Oh yeah! look at this too: http://foobar.baz';

insert magical regex here and

the array would look like this:

theArray[0]='I am some text and check this out!  '
theArray[1]='http://blah.tld/foo/bar'
theArray[2]='  Oh yeah! look at this too: '
theArray[3]='http://foobar.baz'

I'm at a loss, any help would greatly be appreciated

--Eric

2
  • Do you mean that the strings should be splitted each time a URL is found? Commented Jul 15, 2010 at 3:35
  • I know the string, I need to break it up munge it and put it back together, over and over Commented Jul 15, 2010 at 3:48

2 Answers 2

2

Split by URL regex (thanks to @Pullet for pointing out a flaw here):

var urlPattern = /(https?\:\/\/\S+[^\.\s+])/;
someText.split(urlPattern);

Let's break down the regex :)

(https?    -> has "http", and an optional "s"
\:\/\/     -> followed by ://
\S+        -> followed by "contiguous" non-whitespace characters (\S+)
[^\.\s+])  -> *except* the first ".", or a series of whitespace characters (\s+)

Running through your sample text gives,

["I am some text and check this out!  ",
"http://blah.tld/foo/bar",
"  Oh yeah! look at this too: ",
"http://foobar.baz",
""]
Sign up to request clarification or add additional context in comments.

3 Comments

Anurag, thank you so much - that did the trick! Although I'm still struggling to read it!
/(https*\:\/\/\S+[^\.\s+])/ would also match httpssss://test.com which is not a valid url. I think what is wanted is /(https?\:\/\/\S+[^\.\s+])/ The ? means that the preceding character is optional Though if you wanted to support other protocols something like the following would also work (depending on the number of protocols you need to support) /((https?|s?ftp|gopher)\:\/\/\S+[^\.\s+])/
thanks @Pullets, good catch, made the change. I'll let gopher pass, not sure how many people still use it :0)
0

Try this:

<script type="text/javascript">
    var url_regex = /((?:ftp|http|https):\/\/(?:\w+:{0,1}\w*@)?(?:\S+)(?::[0-9]+)?(?:\/|\/(?:[\w#!:.?+=&%@!\-\/]))?)+/g;
    var input = "I am some text and check this out!  http://blah.tld/foo/bar  Oh yeah! look at this too: http://foobar.baz";

    var results = input.split(url_regex);
    console.log(results);
</script>

results =

["I am some text and check this out! ",
"http://blah.tld/foo/bar",
" Oh yeah! look at this too: ",
"http://foobar.baz", ""]

You could trim the individual results too, to not have leading and trailing whitespace on the non-url entries.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.