how to change digits in a string using regex

Question

I have a string like..

'1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation

what i want is to convert the string into..

'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you.. and rest of our conversation

in short, to remove the white space and " between the digits..

i tried to find the pattern by doing..

stuff = re.findall('(\d+\.\d+|\d+)?["]\s?x\s?(\d+\.\d+|\d+)?["]\s?x\s?(\d+\.\d+|\d+)?["]',strings)
print sub

it returns me

[('1.5', '3', '10'), ('7', '4', '2'), ('9.5', '9.5', '7.5'), ('7.1', '4', '2')]

so i tried ,

stuff = re.findall('\d+["]\s?x\s?\d+["]\s?x\s?\d+["]',strings)
print stuff

it returns me

['5"x3"x10"', '7" x 4"x 2"', '1"x 4"x 2"']

it doesn't include any digits..how can i convert my string to desired one? any help ?

zwer · Accepted Answer · 2017-05-27 16:19:09Z

1

If you really want to do it in one step you'll have to do multiple lookaheads/lookbehinds to account for all cases (and it's a question if all of them are even captured with this one):

import re

my_str = '\'1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation'

mod_str = re.sub(r'(?<=[\dx])["\s]+(?=[x\s])|(?<=x)\s(?=\d)', '', my_str)
print(mod_str)

gets you:

'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you ..and rest of our conversation

It would probably be faster (and easier to capture outliers) if you were to split this into a multi-step process.

Explanation:

There are two search patterns here, (?<=[\dx])["\s]+(?=[x\s]) and (?<=x)\s(?=\d), they are separated by | to denote one or the other (in left-to-right fashion, so if the first group captures a piece of content the second won't be executed on it).

The first:

(?<=            positive non-capturing lookbehind, capture the next segment only if match
  [\dx])        match a single digit (0-9) or the 'x' character
)
  ["\s]+        match one or more " characters or whitespace
(?=             positive non-capturing lookahead, capture the previous segment only if match
  [x\s]         match a single whitespace or 'x' character
)

The second:

(?<=            positive non-capturing lookbehind, capture the next segment only if match
  x             match the 'x' character
)
\s              match a single whitespace
(?=             positive non-capturing lookahead, capture the previous segment only if match
  \d            match a single digit (0-9)
)

The first takes care of selecting whitespace and quotation marks around your digits, the second extends selecting white space around "x" characters only if followed by number to augment the deficiency of the first pattern. Together, they match the correct quotation marks and whitespaces which then get replaced by empty string using the re.sub() method.

edited May 27, 2017 at 16:19

answered May 27, 2017 at 16:04

zwer

25.9k3 gold badges53 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

zwer Over a year ago

There you go, you'll get used to regex syntax with time ;)

P.hunter Over a year ago

thanks, I will, but when i tried to add 6"x 4"x 2" .. it gives me.. 1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you 6x4x2"..and rest of our conversation when i add , ie.e leaves a " at the end

zwer Over a year ago

@PaulNicolashunter it performs as expected on a \'1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you 6"x 4"x 2"..and rest of our conversation string. If you were to add a space between the last 2" and the .. it will produce your result (as it tries not to remove space between words). You may remedy it by matching a dot in the accepted character class for the lookahead in the first pattern: ((?<=[\dx])["\s]+(?=[x\s\.])|(?<=x)\s(?=\d)) but then it might fail elsewhere. Relying on simple regex logic is not the most suitable approach for data which may vastly variate.

Bill Bell · Accepted Answer · 2017-05-27 17:20:16Z

1

zwer is clearly a master at regex. You might, however, be interested in an alternative approach that sometimes makes it possible to use simpler expressions. It involves using the re module to identify the strings for changing and then using a Python function to do the manipulation.

In this case we want to identify numbers with or without decimals, always followed by " and x sometimes preceded or succeeded by one or more blanks. This code uses a regex with alternative expressions to look for both, passes what it finds to replacer and leaves it to this function to discard unwanted characters.

>>> import re
>>> quest = '1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation'
>>> def replacer(matchobj):
...     for group in matchobj.groups():
...         if group:
...             return group.replace(' ', '').replace('"', '')
... 
>>> re.sub(r'([0-9\.]+\")|(\s*x\s*)', replacer, quest)
'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you ..and rest of our conversation'

Details in the Python doc in the section for sub.

answered May 27, 2017 at 17:20

Bill Bell

21.7k6 gold badges48 silver badges62 bronze badges

3 Comments

P.hunter Over a year ago

both of the methods are definitely cool.. even though ii'm a newbie to regex can you tell me how much time will it take me to be a master like you and zwer? anyway thanks for the answer i'll keep this method in my mind in future :)

Bill Bell Over a year ago

I am no master! I always find myself checking documentation. And I'm afraid it's like anything else. It depends. How long will it take me to learn to read French well? Probably forever! It's probably obvious. Subscribe to the regex posts here on SO and try to do as many of the questions as you can afford time for.

P.hunter Over a year ago

i promise i will.. :)

user557597 · Accepted Answer · 2017-05-27 17:51:21Z

1

I wouldn't get too complex here.

I'd just match one group of dimensions at a time then replace the whitespace and double quotes.

(\d+(?:\.\d+)?(?:\s*"\s*x\s*\d+(?:\.\d+)?){2}\s*")

Expanded

 (                             # (1 start)
      \d+ 
      (?: \. \d+ )?
      (?:
           \s* " \s* x \s* 
           \d+ 
           (?: \. \d+ )?
      ){2}
      \s* "
 )                             # (1 end)

Python demo http://rextester.com/HUIYP80133

Python code

import re

def repl(m):
    contents = m.group(1)
    return re.sub( r'[\s"]+','', contents )

str = '\'1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation'

newstr = re.sub(r'(\d+(?:\.\d+)?(?:\s*"\s*x\s*\d+(?:\.\d+)?){2}\s*")', repl, str)

print newstr

Output

'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you ..and rest of our conversation

answered May 27, 2017 at 17:51

user557597

1 Comment

P.hunter Over a year ago

thanks for the new method , i'll keep this in my mind for future references :)

Collectives™ on Stack Overflow

how to change digits in a string using regex

3 Answers 3

3 Comments

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related