2

I have a scenario where my pandas data frame have a condition stored as string which I need to execute and store result as different column. Below example will help you understand better;

Existing DataFrame:

ID   Val    Cond
1     5      >10
1     15     >10

Expected DataFrame:

ID   Val    Cond    Result
1     5      >10     False
1     15     >10     True

As you see and I need to concatenate Val and Cond and do eval at row level.

2 Answers 2

1

If your conditions are formed from the basic operations (<, <=, ==, !=, >, >=), then we can do this more efficiently using getattr. We use .str.extract to parse the condition and separate the comparison and the value. Using our dictionary we map the comparison to the Series attributes that we can then call for each unique comparison separately in a simple groupby.

import pandas as pd

print(df)
   ID  Val  Cond
0   1    5   >10
1   1   15   >10
2   1   20  ==20
3   1   25  <=25
4   1   26  <=25

# All operations we might have. 
d = {'>': 'gt', '<': 'lt', '>=': 'ge', '<=': 'le', '==': 'eq', '!=': 'ne'}

# Create a DataFrame with the LHS value, comparator, RHS value
tmp = pd.concat([df['Val'], 
                 df['Cond'].str.extract('(.*?)(\d+)').rename(columns={0: 'cond', 1: 'comp'})], 
                axis=1)
tmp[['Val', 'comp']] = tmp[['Val', 'comp']].apply(pd.to_numeric)
#   Val cond  comp
#0    5    >    10
#1   15    >    10
#2   20   ==    20
#3   25   <=    25
#4   26   <=    25
#5   10   !=    10

# Aligns on row Index
df['Result'] = pd.concat([getattr(gp['Val'], d[idx])(gp['comp']) 
                          for idx, gp in tmp.groupby('cond')])
#   ID  Val  Cond  Result
#0   1    5   >10   False
#1   1   15   >10    True
#2   1   20  ==20    True
#3   1   25  <=25    True
#4   1   26  <=25   False
#5   1   10  !=10   False

Simple, but inefficient and dangerous, is to eval on each row, creating a string of your condition. eval is dangerous as it can evaluate any code, so only use if you truly trust and know the data.

df['Result'] = df.apply(lambda x: eval(str(x.Val) + x.Cond), axis=1)
#    ID  Val  Cond  Result
#0   1    5   >10   False
#1   1   15   >10    True
#2   1   20  ==20    True
#3   1   25  <=25    True
#4   1   26  <=25   False
#5   1   10  !=10   False
Sign up to request clarification or add additional context in comments.

5 Comments

It is a bit risky though. Picture df['Cond'] with value rm -fr /
@PraysonW.Daniel well yes, the general rules for eval still apply. Only use it if you are 100% sure you trust the incoming data. Give me as second to update. So long as the comparisons are the basic comparators there is a safer and much more efficient way tot do this that avoids the eval all together.
Thanks @ALollz, both method worked for me. I decided to go with first one based on your suggestions.
Hey, just wanted to check will this scenario will be able to handle between condition, e.g. if cond is something like '>=20and<=50'>>
@BhaveshJain so that's going to be more difficult, in both the eval and getattr methods as you'll first need to parse that into a valid expression for eval or add a lot more logic to deal with the and/or logic. Might be worth asking another question in that case.
0

You can also do something like this:

df["Result"] = [eval(x + y) for x, y in zip(df["Val"].astype(str), df["Cond"]]

Make the "Result" column by concatenating the strings df["Val"] and df["Cond"], then applying eval to that.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.