0

I looked quite a bit on stack overflow for an answer and nothing pops out. It's still not obvious after reading the link provided but I understand. Perhaps saving this post helps future people who think like I do.

I have reduced my 3.7 vs 2.7 issue down to a very simple code snippet:

import re
myStr = "Mary   had a little lamb.\n"
reg_exp = re.compile('[ \\n\\r]*')
reg_exp.split(myStr)

['', 'M', 'a', 'r', 'y', '', 'h', 'a', 'd', '', 'a', '', 'l', 'i', 't', 't', 'l', 'e', '', 'l', 'a', 'm', 'b', '.', '', '']

In python 2.7 I get full word tokens. I would like to modify the compile line to be greedy * without splitting on characters.

If I don't include the greedy * I get extra spaces.

reg_exp = re.compile('[ \\n\\r]')
reg_exp.split(myStr)

['Mary', '', 'had', 'a', 'little', 'lamb.', '']

I would like to have my cake and eat it too! This is what I want:

['Mary', 'had', 'a', 'little', 'lamb.']

I've tried all sorts of things like various flags. I'm missing something very basic. Can you help? Thanks!

3
  • Is it Python 3.7? Actually, what output do you want to get in all cases? Commented Aug 17, 2018 at 12:15
  • Perhaps you want + instead of *? As it is, you're allowing the split to occur wherever there are 0 or more spaces, which is everywhere. Commented Aug 17, 2018 at 12:29
  • I tried and tried again after your marking as duplicate. If you would be so kind as to provide the link I could make some progress correctly classifying this question. I got my answer so I'm glad I asked anyway.....Thanks! Commented Aug 17, 2018 at 15:30

2 Answers 2

2

[ \\n\\r]* matches empty string

So correct behavior is to split after each letter. Python versions prior to 3.7 ignored empty matches, but version 3.7 fixes that.

You want to replace * with +

reg_exp = re.compile('[ \\n\\r]+')

3.6 docs, 3.7 docs

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. Works.
0

Use + instead of *.

* will repeat 0 or more times, so it matches on "" and splits each character.

+ will repeat 1 or more times, so it only matches when something is found.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.