2

How do I create an array of strings from a string, eg.

"hello world" would return ["hello", "world"]. This would need to take into account punctuation marks, etc.

There's probably a great RegEx solution for this, I'm just not capable of finding it.

2
  • I don't think you should be using RegEx unless you can figure out this one by yourself. Commented Jun 25, 2010 at 13:30
  • 4
    That's completely unhelpful, thanks. Commented Jun 28, 2010 at 7:46

7 Answers 7

2

How about AS3's String.split?

var text:String = "hello world";
var split:Array = text.split(" "); // this will give you ["hello", "world"]
// then iterate and strip out any redundant punctuation like commas, colons and full stops
Sign up to request clarification or add additional context in comments.

1 Comment

It's the stripping out the punctuation I'm interested in. I know how to do this with a rather clunky if/else - I'm looking for a more elegant solution though (enter RegExp..)
2

Think I've cracked it, here is the function in full:

public static function getArrayFromString(str:String):Array {
        return str.split(/\W | ' | /gi);
    }

Basically, it uses the 'not a word' condition but excludes apostrophes, is global and ignores case. Thanks to everyone who pointed me in the right direction.

1 Comment

It's good you have something you are happy with. Like a newborn child, take a photo of this, because I think it is the last time you will see it so small. Other unwanted characters are on their way, such as the other species of apostrophe. Feel free to post the regex back here for interest's sake if it becomes particularly frightening...
1

Any reason that:

var myString:String = "hello world";

var reg:RegExp = /\W/i;

var stringAsArray:Array = myString.replace(reg, "").split(" ");

Won't work?

3 Comments

That doesn't strip out full stops or commas, but does strip out apostrophes. So we get "doesnt." instead of "doesn't", etc. Essentially, I'd like to take a paragraph of text and end up with an array of the words in it, minus the space and fullstops. Good effort though.
@dr_tchock - Just keep working on the RegEx. \W is supposed to match all non-word characters (which would include all punctuation except for the underscore character).
RegEx blows my tiny little mind into a billion pieces. I shall try though.. thanks.
1

Maybe this one works too...

public static function getArrayFromString(str:String):Array {
    return str.split(/[^,\.\s\n\r\f¿\?¡!]+/gi);
    }

That should work in languages other than English, for example (i.e. '\w' won't accept accented characters, for instance...)

Comments

1

Here's what you need. Tested and working:

private function splitString(str:String):Array {
    var r:RegExp = /\W+/g;
    return str.split(r));
}

http://snipplr.com/view/63811/split-string-into-array/

Comments

0

This seems to do what you want:

package
{
import flash.display.Sprite

public class WordSplit extends Sprite
{
    public function WordSplit()
    {
        var inText:String = "This is a Hello World example.\nIt attempts,\
            to simulate! what splitting\" words ' using: puncuation\tand\
            invisible ; characters ^ & * yeah.";

        var regExp:RegExp = /\w+/g;
        var wordList:Array = inText.match(regExp);

        trace(wordList);
    }
}
}

If not, please provide a sample input and output specification.

4 Comments

As I said you need to provide a complete input and output specification. I can't keep guessing what you believe does and doesn't constitute a word.
It's pretty obvious what does and doesn't constitute a word, no guessing required. Thanks for the help though.
Not as obvious as you think. You need an apostrophe now. How about hyphenated words? Do you consider currency ($100) a word? Your regular expression will become your specification.
You are right of course, thanks for pointing that out. For now, I don't think that'll be a problem though, hopefully.
0

I think you might want something like this:

public static function getArrayFromString(str:String):Array {
    return str.split(/[\W']+/gi);
}

Basically, you can add any characters that you want to be considered delimiters into the square brackets. Here's how the pieces work:

  1. The brackets define a set of characters.
  2. The things in the brackets are the characters in the set (with \W being "not a word")
  3. The plus sign means "one or more of the previous item"—in this case, the character set. That way, if you have something with several of the characters in a row, you won't get empty items in your array.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.