669

I'm looking for the Python equivalent of

String str = "many   fancy word \nhello    \thi";
String whiteSpaceRegex = "\\s";
String[] words = str.split(whiteSpaceRegex);

["many", "fancy", "word", "hello", "hi"]
0

4 Answers 4

1227

The str.split() method without an argument splits on whitespace:

>>> "many   fancy word \nhello    \thi".split()
['many', 'fancy', 'word', 'hello', 'hi']
Sign up to request clarification or add additional context in comments.

9 Comments

Also good to know is that if you want the first word only (which means passing 1 as second argument), you can use None as the first argument: s.split(None, 1)
If you only want the first word, use str.partition.
@yak : Can you please edit your comment. The way it sounds right now is that s.split(None, 1) would return 1st word only. It rather gives a list of size 2. First item being the first word, second - rest of the string. s.split(None, 1)[0] would return the first word only
@galois No, it uses a custom implementation (which is faster). Also note that it handles leading and trailing whitespace differently.
@KishorPawar It's rather unclear to me what you are trying to achieve. Do you want to split on whitespace, but disregard whitespace inside single-quoted substrings? If so, you can look into shlex.split(), which may be what you are looking for. Otherwise I suggest asking a new question – you will get a much quicker and more detailed answer.
|
93
import re
s = "many   fancy word \nhello    \thi"
re.split('\s+', s)

3 Comments

this gives me a whitespace token at the end of the line. No idea why, the original line doesn't even have that. Maybe this ignores newline?
@Gulzar do a strip() at the end
Note that this is usually slower than str.split if performance is an issue.
31

Using split() will be the most Pythonic way of splitting on a string.

It's also useful to remember that if you use split() on a string that does not have a whitespace then that string will be returned to you in a list.

Example:

>>> "ark".split()
['ark']

Comments

22

Another method through re module. It does the reverse operation of matching all the words instead of spitting the whole sentence by space.

>>> import re
>>> s = "many   fancy word \nhello    \thi"
>>> re.findall(r'\S+', s)
['many', 'fancy', 'word', 'hello', 'hi']

Above regex would match one or more non-space characters.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.