RegEx: Find all digits after certain string

Question

I am trying to get all the digits from following string after the word classes (or its variations)

Accepted for all the goods and services in classes 16 and 41.

expected output:

16
41

I have multiple strings which follows this pattern and some others such as:

classes 5 et 30 # expected output 5, 30
class(es) 32,33 # expected output 32, 33
class 16        # expected output 5

Here is what I have tried so far: https://regex101.com/r/eU7dF6/3

(class[\(es\)]*)([and|et|,|\s]*(\d{1,}))+

But I am able to get only the last matched digit i.e. 41 in the above example.

Wiktor Stribiżew · Accepted Answer · 2016-02-10 09:34:11Z

1

I suggest grabbing all the substring with numbers after class or classes/class(es) and then get all the numbers from those:

import re
p = re.compile(r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*\d+)+')
test_str = "Accepted for all the goods and services in classes 16 and 41."
results = [re.findall(r"\d+", x) for x in p.findall(test_str)]
print([x for l in results for x in l])
# => ['16', '41']

See IDEONE demo

As \G construct is not supported, nor can you access the captures stack using Python re module, you cannot use your approach.

However, you can do it the way you did with PyPi regex module.

>>> import regex
>>> test_str = "Accepted for all the goods and services in classes 16 and 41."
>>> rx = r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*(?P<num>\d+))+'
>>> res = []
>>> for x in regex.finditer(rx, test_str):
        res.extend(x.captures("num"))
>>> print res
['16', '41']

edited Feb 10, 2016 at 9:34

answered Feb 10, 2016 at 8:51

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

AKS Over a year ago

Thanks Wiktor, however what you have suggested is also a two fold approach just like the answer provided by vks and of course I could use this approach. But I would like to have a single regex that gives me the results.

Wiktor Stribiżew Over a year ago

but can you use a PyPi regex module? I said you can't use a single regex with re.

AKS Over a year ago

Using an external library is not a problem at all. :)

vks Over a year ago

@AKS if its not a problem you can check my edit...dont using regex module in python

AKS Over a year ago

Thanks Wiktor, I used the regex module with correct regex and it works great!

|

vks · Accepted Answer · 2016-02-10 09:29:12Z

1

You can do it in 2 steps.Regex engine remebers only the last group in continous groups.

x="""Accepted for all the goods and services in classes 16 and 41."""
print re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0])

Output:['16', '41']

If you dont want string use

print map(ast.literal_eval,re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0]))

Output:[16, 41]

If you have to do it in one regex use regex module

import regex
x="""Accepted for all the goods and services in classes 16 and 41."""
print [ast.literal_eval(i) for i in regex.findall(r"class[\(es\)]*|\G(?:and|et|,|\s)*(\d+)",x,regex.VERSION1) if i]

Output:[16, 41]

edited Feb 10, 2016 at 9:29

answered Feb 10, 2016 at 8:48

vks

68.1k11 gold badges96 silver badges132 bronze badges

3 Comments

AKS Over a year ago

Yes! I could do this in fact. but I am just wondering if there is a pure regex which could give me what I need.

vks Over a year ago

@AKS check edit..you can do it using regex module but it is not there in default python

Wiktor Stribiżew Over a year ago

@vks: :) Is it bedtime there? See this demo, and you will get the idea.

Collectives™ on Stack Overflow

RegEx: Find all digits after certain string

2 Answers 2

6 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related