1

I'm trying to split a string and counts the number os words using Ruby but I want ignore special characters.

For example, in this string "Hello, my name is Hugo ..." I'm splitting it by spaces but the last ... should't counts because it isn't a word.

I'm using string.inner_text.split(' ').length. How can I specify that special characters (such as ... ? ! etc.) when separated from the text by spaces are not counted?

Thank you to everyone, Kind Regards, Hugo

2 Answers 2

6
 "Hello, my name is não ...".scan /[^*!@%\^\s\.]+/
 # => ["Hello,", "my", "name", "is", "não"] 

/[^*!@%\^]+/ will match anything other than *!@%\^. You can add more to this list which need not be matched

Sign up to request clarification or add additional context in comments.

2 Comments

Arf, you're too quick this morning!
It works but only for non-portuguese words. With accents, like "não", "é", etc. it removes the accented letter and split the word: "não" appears as "n" "o".
1

this is part answer, part response to @Neo's answer: why not use proper tools for the job?

http://www.ruby-doc.org/core-1.9.3/Regexp.html says:

POSIX bracket expressions are also similar to character classes. They provide a portable alternative to the above, with the added benefit that they encompass non-ASCII characters. For instance, /\d/ matches only the ASCII decimal digits (0-9); whereas /[[:digit:]]/ matches any character in the Unicode Nd category.

  • /[[:alnum:]]/ - Alphabetic and numeric character
  • /[[:alpha:]]/ - Alphabetic character
  • ...

Ruby also supports the following non-POSIX character classes:

  • /[[:word:]]/ - A character in one of the following Unicode general categories Letter, Mark, Number, Connector_Punctuation

you want words, use str.scan /[[:word:]]+/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.