0

So i want to get specific data

So string is an input by the user

"price_to_earning + current_price * 0.8"

It could even be

"price_to_earning*current_price+0.8"

or

"price_to_earning *current_price/0.8"

How can i extract just "price_to_earning" & "current_price" from the above

currently, I'm using

    words = re.findall(r"\b\S+", raw_query)

but it gets

['price_to_earning', 'current_price', '0.8'] 

what I want is

['price_to_earning', 'current_price']
2
  • maybe try this? \b[^\r\n\t\f\v [0-9.]]+ Commented Jul 12, 2022 at 7:50
  • Using regex to parse mathematical expressions is misdirected. You want to cope with arbitrary nesting of parentheses; use a proper parser instead. This is a common FAQ. Commented Jul 12, 2022 at 8:35

3 Answers 3

1

Why not use a regex to match words without digits, e.g. [^\d\W]+ ?

have a look at the demo here https://regex101.com/r/EbNQvm/1

Sign up to request clarification or add additional context in comments.

Comments

1

You can specify the characters you want to exclude, and replace everything else with a space, for example -

s1 = "price_to_earning + current_price * 0.8"
s2 = "price_to_earning*current_price+0.8"
s3 = "price_to_earning *current_price/0.8"
for s in [s1, s2, s3]:
    print(re.sub(r'[^a-zA-Z_]', ' ', s).split())

Output

['price_to_earning', 'current_price']
['price_to_earning', 'current_price']
['price_to_earning', 'current_price']

Comments

0

You can try finding only characters and _

words = re.findall(r"[a-zA-Z_]+", raw_query)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.