1

I have string of ingredients of a product like this:

text = 'Pork and beef, water, salt (1,7%), spices (white pepper, nutmeg, coriander, cardamom), stabilizer (E450), glucose, antioxidant (E316), a preservative (E250), flavorings'

I want to detect all the text (ingredients) from it such that it should look like this.

ingredientsList= ['Pork and beef', 'salt', 'spices', 'white pepper', 'nutmeg', 'coriander', 'cardamom', 'stabilizer', 'glucose', 'antioxidant', 'preservative', 'flavorings']

The current regex I am using here is the following:

ingredients = re.findall(r'\([^()]*\)|([^\W\d]+(?:\s+[^\W\d]+)*)', text)

But it is not providing the the text in the bracket. I just did not want to include codes and percentages but want all the ingredients inside the brackets. What should I do here ? Thanks in advance.

1 Answer 1

3

You may restrict the first branch to only match codes that start with E and are followed with number:

\(E\d+\)|([^\W\d]+(?:\s+[^\W\d]+)*)

See the regex demo

Now, \(E\d+\) will match (Exxx)-like substrings only, and others will be processed. You may add the percentages here, too, to explicitly skip them - \((?:E\d+|\d+(?:[.,]\d+)?%)\).

Python demo:

import re
rx = r"\(E\d+\)|([^\W\d]+(?:\s+[^\W\d]+)*)"
s = "Pork and beef, water, salt (1,7%), spices (white pepper, nutmeg, coriander, cardamom), stabilizer (E450), glucose, antioxidant (E316), a preservative (E250), flavorings"
res = [x for x in re.findall(rx, s) if x]
print(res)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.