Setting regex in python for a complex string

Question

I have string of ingredients of a product like this:

text = 'Pork and beef, water, salt (1,7%), spices (white pepper, nutmeg, coriander, cardamom), stabilizer (E450), glucose, antioxidant (E316), a preservative (E250), flavorings'

I want to detect all the text (ingredients) from it such that it should look like this.

ingredientsList= ['Pork and beef', 'salt', 'spices', 'white pepper', 'nutmeg', 'coriander', 'cardamom', 'stabilizer', 'glucose', 'antioxidant', 'preservative', 'flavorings']

The current regex I am using here is the following:

ingredients = re.findall(r'\([^()]*\)|([^\W\d]+(?:\s+[^\W\d]+)*)', text)

But it is not providing the the text in the bracket. I just did not want to include codes and percentages but want all the ingredients inside the brackets. What should I do here ? Thanks in advance.

Wiktor Stribiżew · Accepted Answer · 2016-10-26 10:04:37Z

3

You may restrict the first branch to only match codes that start with E and are followed with number:

\(E\d+\)|([^\W\d]+(?:\s+[^\W\d]+)*)

See the regex demo

Now, \(E\d+\) will match (Exxx)-like substrings only, and others will be processed. You may add the percentages here, too, to explicitly skip them - \((?:E\d+|\d+(?:[.,]\d+)?%)\).

Python demo:

import re
rx = r"\(E\d+\)|([^\W\d]+(?:\s+[^\W\d]+)*)"
s = "Pork and beef, water, salt (1,7%), spices (white pepper, nutmeg, coriander, cardamom), stabilizer (E450), glucose, antioxidant (E316), a preservative (E250), flavorings"
res = [x for x in re.findall(rx, s) if x]
print(res)

answered Oct 26, 2016 at 10:04

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Setting regex in python for a complex string

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related