2

I am trying to get all the digits from following string after the word classes (or its variations)

Accepted for all the goods and services in classes 16 and 41.

expected output:

16
41

I have multiple strings which follows this pattern and some others such as:

classes 5 et 30 # expected output 5, 30
class(es) 32,33 # expected output 32, 33
class 16        # expected output 5

Here is what I have tried so far: https://regex101.com/r/eU7dF6/3

(class[\(es\)]*)([and|et|,|\s]*(\d{1,}))+

But I am able to get only the last matched digit i.e. 41 in the above example.

2 Answers 2

1

I suggest grabbing all the substring with numbers after class or classes/class(es) and then get all the numbers from those:

import re
p = re.compile(r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*\d+)+')
test_str = "Accepted for all the goods and services in classes 16 and 41."
results = [re.findall(r"\d+", x) for x in p.findall(test_str)]
print([x for l in results for x in l])
# => ['16', '41']

See IDEONE demo

As \G construct is not supported, nor can you access the captures stack using Python re module, you cannot use your approach.

However, you can do it the way you did with PyPi regex module.

>>> import regex
>>> test_str = "Accepted for all the goods and services in classes 16 and 41."
>>> rx = r'\bclass(?:\(?es\)?)?(?:\s*(?:and|et|[,\s])?\s*(?P<num>\d+))+'
>>> res = []
>>> for x in regex.finditer(rx, test_str):
        res.extend(x.captures("num"))
>>> print res
['16', '41']
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks Wiktor, however what you have suggested is also a two fold approach just like the answer provided by vks and of course I could use this approach. But I would like to have a single regex that gives me the results.
but can you use a PyPi regex module? I said you can't use a single regex with re.
Using an external library is not a problem at all. :)
@AKS if its not a problem you can check my edit...dont using regex module in python
Thanks Wiktor, I used the regex module with correct regex and it works great!
|
1

You can do it in 2 steps.Regex engine remebers only the last group in continous groups.

x="""Accepted for all the goods and services in classes 16 and 41."""
print re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0])

Output:['16', '41']

If you dont want string use

print map(ast.literal_eval,re.findall(r"\d+",re.findall(r"class[\(es\)]*\s*(\d+(?:(?:and|et|,|\s)*\d+)*)",x)[0]))

Output:[16, 41]

If you have to do it in one regex use regex module

import regex
x="""Accepted for all the goods and services in classes 16 and 41."""
print [ast.literal_eval(i) for i in regex.findall(r"class[\(es\)]*|\G(?:and|et|,|\s)*(\d+)",x,regex.VERSION1) if i]

Output:[16, 41]

3 Comments

Yes! I could do this in fact. but I am just wondering if there is a pure regex which could give me what I need.
@AKS check edit..you can do it using regex module but it is not there in default python
@vks: :) Is it bedtime there? See this demo, and you will get the idea.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.