11

I'm trying to split a string into an array of words, however I want to keep the spaces after each word. Here's what I'm trying:

var re = /[a-z]+[$\s+]/gi;
var test = "test   one two     three   four ";
var results = test.match(re);

The results I expect are:

[0]: "test   "
[1]: "one "
[2]: "two     "
[3]: "three   "
[4]: "four "

However, it only matches up to one space after each word:

[0]: "test "
[1]: "one "
[2]: "two "
[3]: "three "
[4]: "four "

What am I doing wrong?

1
  • if you need to keep the space, why add $ and + in the second class? Commented Aug 23, 2010 at 14:37

5 Answers 5

13

Consider:

var results = test.match(/\S+\s*/g);

That would guarantee you don't miss any characters (besides a few spaces at the beginnings, but \S*\s* can take care of that)

Your original regex reads:

  • [a-z]+ - match any number of letters (at least one)
  • [$\s+] - much a single character - $, + or whitespace. With no quantifier after this group, you only match a single space.
Sign up to request clarification or add additional context in comments.

Comments

2

Try the following:

test.match(/\w+\s+/g); // \w = words, \s = white spaces

5 Comments

Or if the last bit of whitespace is optional: test.match(/\w+\s*/gi)
@Wolph: why the case-insensitive flag?
This will split "I'm coding" into "I", "m" and "coding".
@DanDascalescu: no specific reason, more of a habit really
This works bad, and "eats" words if you try the following sentence: The moon is our natural satellite, i.e. it rotates around the Earth!
1

You are using + inside the char class. Try using * outside the char class instead.

/[a-z]+\s*/gi;

+ inside the char class is treated as a literal + and not as a meta char. Using * will capture zero or more spaces that might follow any word.

Comments

0

The + is taken literally inside the character class. You have to move it outside: [\s]+ or just \s+ ($ has no meaning inside the class either).

Comments

0

The essential bit of your RegEx that needs changing is the part matching the whitespace or end-of-line.

Try:

var re = /[a-z]+($|\s+)/gi

or, for non-capturing groups (I don't know if you need this with the /g flag):

var re = /[a-z]+(?:$|\s+)/gi

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.