Integer pattern - Python regex

Question

I've seen many posts on this but I still can't get it to work, I have no idea why.

What I have is a relatively simple strings with some floating point and integer numbers in it, e.g.: '2 1.000000000000000 1 1 0'. I want to extract only the integers from it, in this example only 2, 1, 1, 0 (not the 1 that's followed by 0s).

I know I have to use lookbehind and lookahead to rule out numbers that are preceded or followed by a .. I can successfully find the numbers that are preceded by a coma, in the said case the 0:

import re
IntegerPattern = re.compile('-?(?<=\.)\d+(?!\.)')
a = '2   1.000000000000000       1   1 0'
IntegerPattern.findall(a)

will return ['000000000000000'], exactly as I want. But when I try to find numbers that are not preceded by .s this doesn't work:

import re
IntegerPattern = re.compile('-?(?<!\.)\d+(?!\.)')
a = '2   1.000000000000000       1   1 0'
IntegerPattern.findall(a)

returns ['2', '00000000000000', '1', '1', '0']. Any ideas why? I'm completely new to regular expressions in general and this just eludes me. It should work but it does not. Any help would be appreciated.

nu11p01n73R · Accepted Answer · 2014-10-27 19:18:21Z

3

Use the regex

(\s|^)\d+(\s|$)

the code can be

>>>  n='2 1.000000000000000 1 1 0'
>>> re.findall(r'(?:\s|^)\d+(?:\s|$)', n)
['2 ', ' 1 ', ' 0']

(\s*|^) matches a space or start of string

\d+ matches any number of digits

(\s*|$) matches space or end of string

NOTE: \b cannot be used to delimit \d+ as . is also included in \b

Example http://regex101.com/r/gP1nK0/1

EDIT

Why doesnt the regex (?<!\.)\d+(?!\.) work

now here the problem is when using look negative around assertions, we try to not to match . and the regex tries to match .

when you write (?<!\.) the regex finds a position where it can be successfull

that is in say 1.000000 the regex fixes the position second 0 such that the previous position is not . (which is zero) and remaining is 00000 thus winning. Hence it matches it

to get a clearer view check this link

http://regex101.com/r/gP1nK0/2

as you can see for the 1.000000000000000 the match occures from second 0 making it successfull

EDIT 1

a more perfect regex would be like

(?:(?<=^)|(?<=\s))\d+(?=\s|$)

>>>n
'1 2 3 4.5'
>>> re.findall(r'(?:(?<=^)|(?<=\s))\d+(?=\s|$)', n)
['1', '2', '3']
>>> n='1 2 3 4'
>>> re.findall(r'(?:(?<=^)|(?<=\s))\d+(?=\s|$)', n)
['1', '2', '3', '4']

Thank you sln for pointing that out

edited Oct 27, 2014 at 19:18

answered Oct 27, 2014 at 18:26

nu11p01n73R

26.8k3 gold badges42 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Aleksander Lidtke Over a year ago

OK, thanks, but this is partially a learning experience. If I had a string without preceding whitespaces, what should I do? And why doesn't my expression work?

Smern Over a year ago

@AleksanderLidtke, to answer why your question doesn't work. If you count the number of 0's you'll notice it's one less. From the second 0 on, it isn't immediately preceded by a \. so it passes.

Aleksander Lidtke Over a year ago

@smerny OK that explains it, thanks. Didn't actually think of counting the 0s.

nu11p01n73R Over a year ago

@AleksanderLidtke i have added an edit on why it doesnt work. hope it helps you :)

user557597 Over a year ago

Doesn't match 2 and 4 in 1 2 3 4

|

deets · Accepted Answer · 2014-10-27 18:17:02Z

2

I wouldn't bother with regexes:

 s = '2   1.000000000000000       1   1 0'

 print [int(part) for part in s.split() if "." not in part]

It's often much simpler to work with basic string manipulation, or as the old saying goes "I had a problem I tried to solve with regexes. Then I had two problems"

answered Oct 27, 2014 at 18:17

deets

6,40531 silver badges29 bronze badges

Comments

Padraic Cunningham · Accepted Answer · 2014-10-27 18:37:30Z

1

a = '-2   1.000000000000000       1   1 0'
print([x for x in a.split() if x[1:].isdigit() or x.isdigit()])
['-2', '1', '1', '0']

If you want the digits before the . also:

a = '2   1.000000000000000       1   1 0'


print([x if x.isdigit() else x.split(".")[0] for x in a.split() ])
['2', '1', '1', '1', '0']

edited Oct 27, 2014 at 18:37

answered Oct 27, 2014 at 18:17

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

2 Comments

Padraic Cunningham Over a year ago

@JoranBeasley, cheers, I was thinking that it would not work for negative nums but the OP does not seem to be checking for it in their regex so I guess it should work :P

Padraic Cunningham Over a year ago

@smerny, it does get the first 1

score 0 · Accepted Answer · 2014-10-27 18:33:18Z

0

The engine is compensating to match.
It sheds a \d on the left, then matches.

This ensures no digits are shed on the left -

 # (?<![.\d])\d+(?!\.)

 (?<! [.\d] )
 \d+ 
 (?! \. )

Just a note - In your first pattern -?(?<=\.)\d+(?!\.)
The -? will never actually match a dash because it is not a \. which the assertion
states must be there.
The rule is never point an assertion in a direction that directly contains a literal
unless the literal is included in the assertion. In this case it is out of order anyway,
so the -? is entirely useless.

edited Oct 27, 2014 at 18:33

answered Oct 27, 2014 at 18:26

user557597

2 Comments

Aleksander Lidtke Over a year ago

When I run your pattern: pat=re.compile('(?<! [.\d] )\d+(?! \. )') with my original example a = '2 1.000000000000000 1 1 0' I still get ['2', '1', '000000000000000', '1', '1']. And well pointed out with the -?, it was just a leftover from the original pattern.

user557597 Over a year ago

I'm afraid that is quite impossible ! And pat=re.compile('(?<! [.\d] )\d+(?! \. )') isn't my pattern.

Collectives™ on Stack Overflow

Integer pattern - Python regex

4 Answers 4

7 Comments

Comments

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

7 Comments

Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related