0

I have a string like..

'1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation

what i want is to convert the string into..

'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you.. and rest of our conversation

in short, to remove the white space and " between the digits..

i tried to find the pattern by doing..

stuff = re.findall('(\d+\.\d+|\d+)?["]\s?x\s?(\d+\.\d+|\d+)?["]\s?x\s?(\d+\.\d+|\d+)?["]',strings)
print sub

it returns me

[('1.5', '3', '10'), ('7', '4', '2'), ('9.5', '9.5', '7.5'), ('7.1', '4', '2')]

so i tried ,

stuff = re.findall('\d+["]\s?x\s?\d+["]\s?x\s?\d+["]',strings)
print stuff

it returns me

['5"x3"x10"', '7" x 4"x 2"', '1"x 4"x 2"']

it doesn't include any digits..how can i convert my string to desired one? any help ?

0

3 Answers 3

1

If you really want to do it in one step you'll have to do multiple lookaheads/lookbehinds to account for all cases (and it's a question if all of them are even captured with this one):

import re

my_str = '\'1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation'

mod_str = re.sub(r'(?<=[\dx])["\s]+(?=[x\s])|(?<=x)\s(?=\d)', '', my_str)
print(mod_str)

gets you:

'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you ..and rest of our conversation

It would probably be faster (and easier to capture outliers) if you were to split this into a multi-step process.

Explanation:

There are two search patterns here, (?<=[\dx])["\s]+(?=[x\s]) and (?<=x)\s(?=\d), they are separated by | to denote one or the other (in left-to-right fashion, so if the first group captures a piece of content the second won't be executed on it).

The first:

(?<=            positive non-capturing lookbehind, capture the next segment only if match
  [\dx])        match a single digit (0-9) or the 'x' character
)
  ["\s]+        match one or more " characters or whitespace
(?=             positive non-capturing lookahead, capture the previous segment only if match
  [x\s]         match a single whitespace or 'x' character
)

The second:

(?<=            positive non-capturing lookbehind, capture the next segment only if match
  x             match the 'x' character
)
\s              match a single whitespace
(?=             positive non-capturing lookahead, capture the previous segment only if match
  \d            match a single digit (0-9)
)

The first takes care of selecting whitespace and quotation marks around your digits, the second extends selecting white space around "x" characters only if followed by number to augment the deficiency of the first pattern. Together, they match the correct quotation marks and whitespaces which then get replaced by empty string using the re.sub() method.

Sign up to request clarification or add additional context in comments.

3 Comments

There you go, you'll get used to regex syntax with time ;)
thanks, I will, but when i tried to add 6"x 4"x 2" .. it gives me.. 1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you 6x4x2"..and rest of our conversation when i add , ie.e leaves a " at the end
@PaulNicolashunter it performs as expected on a \'1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you 6"x 4"x 2"..and rest of our conversation string. If you were to add a space between the last 2" and the .. it will produce your result (as it tries not to remove space between words). You may remedy it by matching a dot in the accepted character class for the lookahead in the first pattern: ((?<=[\dx])["\s]+(?=[x\s\.])|(?<=x)\s(?=\d)) but then it might fail elsewhere. Relying on simple regex logic is not the most suitable approach for data which may vastly variate.
1

zwer is clearly a master at regex. You might, however, be interested in an alternative approach that sometimes makes it possible to use simpler expressions. It involves using the re module to identify the strings for changing and then using a Python function to do the manipulation.

In this case we want to identify numbers with or without decimals, always followed by " and x sometimes preceded or succeeded by one or more blanks. This code uses a regex with alternative expressions to look for both, passes what it finds to replacer and leaves it to this function to discard unwanted characters.

>>> import re
>>> quest = '1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation'
>>> def replacer(matchobj):
...     for group in matchobj.groups():
...         if group:
...             return group.replace(' ', '').replace('"', '')
... 
>>> re.sub(r'([0-9\.]+\")|(\s*x\s*)', replacer, quest)
'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you ..and rest of our conversation'

Details in the Python doc in the section for sub.

3 Comments

both of the methods are definitely cool.. even though ii'm a newbie to regex can you tell me how much time will it take me to be a master like you and zwer? anyway thanks for the answer i'll keep this method in my mind in future :)
I am no master! I always find myself checking documentation. And I'm afraid it's like anything else. It depends. How long will it take me to learn to read French well? Probably forever! It's probably obvious. Subscribe to the regex posts here on SO and try to do as many of the questions as you can afford time for.
i promise i will.. :)
1

I wouldn't get too complex here.

I'd just match one group of dimensions at a time then replace the whitespace and double quotes.

(\d+(?:\.\d+)?(?:\s*"\s*x\s*\d+(?:\.\d+)?){2}\s*")

Expanded

 (                             # (1 start)
      \d+ 
      (?: \. \d+ )?
      (?:
           \s* " \s* x \s* 
           \d+ 
           (?: \. \d+ )?
      ){2}
      \s* "
 )                             # (1 end)

Python demo http://rextester.com/HUIYP80133

Python code

import re

def repl(m):
    contents = m.group(1)
    return re.sub( r'[\s"]+','', contents )

str = '\'1.5"x3"x10" hey 7" x 4"x 2" how 9.5" x 9.5" x 7.5" are 7.1"x 4"x 2" you ..and rest of our conversation'

newstr = re.sub(r'(\d+(?:\.\d+)?(?:\s*"\s*x\s*\d+(?:\.\d+)?){2}\s*")', repl, str)

print newstr

Output

'1.5x3x10 hey 7x4x2 how 9.5x9.5x7.5 are 7.1x4x2 you ..and rest of our conversation

1 Comment

thanks for the new method , i'll keep this in my mind for future references :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.