3

I have various strings that might have multiple leading spaces.

string_1 = '    param A     val A'
string_2 = 'param B    val B'

....

I want to replace all multiple spaces with a single space IF the multiple spaces are not in the start of the string.

I want output of above to become

 string_1 = '    param A val A'z
 string_2 = 'param B val B'

My current solution replaces all multiple spaces with a single space regardless.

 re.sub('\s+',' ',s)

How would I construct a pattern that only captures non leading multiple spaces?

3 Answers 3

5

You can use \b\s{2,}\b as your pattern, If those multiple spaces are leading, they are not in a word boundary. Also for multiple spaces use {2,} instead of + to exclude single space:

import re


string_1 = "    param A     val A"
string_2 = "param B    val B"


pattern = re.compile(r"\b\s{2,}\b")
for test in (string_1, string_2):
    print(pattern.sub(" ", test))

output:

    param A val A
param B val B

Note: Trailing multiple spaces is not changed this way. To do that you can omit last \b then if converts to a single space again.


As noted by @JvdV, \b doesn't take other range of characters into account. For example if you a string like "[ param A val A ]", the above pattern won't work for it. Instead you can use a Positive Look Behind assertion ((?<=\S)) and Positive Look Ahead assertion ((?=\S)) to match any non-white-space character:

>>> import re
>>> text = "[    param A     val A    ]"
>>> re.sub(r"\b\s{2,}\b", " ", text)
'[    param A val A    ]'
>>> re.sub(r"(?<=\S)\s{2,}(?=\S)", " ", text)
'[ param A val A ]'
Sign up to request clarification or add additional context in comments.

3 Comments

+ but a small note: \b will not recognize any characters other than \w.
@JvdV Thanks for mentioning. Updated accordingly.
There should not be a (?=\S) assertion at all because the OP does not say that trailing spaces should not be replaced. {2,} is greedy so (?<=\S)\s{2,} will do.
3

Have a go with:

(?<=\S)\s+(\s\S|$)

And replace with \1. See an online demo


  • (?<=\S) - Assert position is preceded by a non-whitespace char;
  • \s+ - 1+ Whitespace char (greedy);
  • (\s\S|$) - Capture group to match a whitespace char and a non-whitespace char or a end-string anchor.

The above will leave only leading spaces alone. For example:

import regex as re
s = '    Abc   12$   D   Efg    '
print(re.sub(r'(?<=\S)\s+(\s\S|$)', '\1', s))

Prints: Abc 12$ D Efg

Comments

0

split your input into two groups (1st group: leading space, 2nd group: everything else), then eliminate the multispaces from 2nd group, and join groups:

>>> string_1 = '    param A     val A'
>>> sb = re.compile(r'^(\s+)(.*)')
>>> mg = sb.match(string_1).groups()
>>> mg[0] + ' '.join(mg[1].split())
'    param A val A'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.