1

I have the following dataframe

dict1 = {'x_math_lp': {'John':'0',
                  'Lisa': 1,
                  'Karyn': '2'},
         'o_math_lp': {'John': 0.005,
                       'Lisa': 0.001,
                       'Karyn':0.9}}
df= pd.DataFrame(dict1)

I would like to apply a condition such that if a value in the first column is less than 1 and the value in the 2nd column if >= 0.05, then replace the value in the first column with 'NaN'

Results should look like this

       x_math_lp    o_math_lp
John    NaN          0.005
Lisa    1            0.001
Karyn   NaN          0.900

Note: The reason why I want to use a loop is because my true dataframe has 30 columns and I was to do it for every column pair set in the dataframe, essentially, updating the entire dataframe.

1 Answer 1

2

You can use .loc for your desired column and check you condition like below. (Because some number in x_math_lp is str you can use pd.to_numeric)

Try this:

>>> import numpy as np
>>> df.x_math_lp = pd.to_numeric(df.x_math_lp, errors='coerce')
>>> df.loc[((df['x_math_lp'] < 1) | (df['o_math_lp'] >= 0.005)), 'x_math_lp'] = np.nan
>>> df
       x_math_lp    o_math_lp
John    NaN         0.005
Lisa    1           0.001
Karyn   NaN         0.900

If you want to run on multiple columns for every column pair you can use this:

>>> df= pd.DataFrame({'x_math_lp': {'John': 0,'Lisa': 1,'Karyn': 2},'o_math_lp': {'John': 0.005,'Lisa': 0.001,'Karyn':0.9},'y_math_lp': {'John': 0,'Lisa': 1,'Karyn': 2},'p_math_lp': {'John': 0.005,'Lisa': 0.001,'Karyn':0.9}})
>>> columns = df.columns
>>> for a,b in  zip(columns[::2],columns[1::2]):
...    df.loc[((df[a] < 1) | (df[b] >= 0.005)), a] = np.nan
>>> df

       x_math_lp    o_math_lp   y_math_lp   p_math_lp
John     NaN         0.005            NaN   0.005
Lisa     1.0         0.001            1.0   0.001
Karyn    NaN         0.900            NaN   0.900
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for responding. The result is correct but it doesn't apply the condition if df['o_math_lp'] is less than zero. Both conditions need to be met and then we update the first column.
Is the first condition missing? df.loc[(df['x_math_lp'] < 1) | (df['o_math_lp'] >= 0.005), 'x_math_lp'] = np.nan. (He explained his question with 'and' but I put or operator according to expected output.)
Thank you guys very much, I'm going to try and use this to generate a loop to deal with my whole dataset.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.