1

My code:

xdf = pd.DataFrame(data={'A':[-10,np.nan,-2.2],'B':[np.nan,2,1.5],'C':[3,1,-0.3]},index=['2023-05-13 08:40:00','2023-05-13 08:41:00','2023-05-13 08:42:00'])
xdf = 
                          A      B      C
2023-05-13 08:40:00     -10.0   NaN     3.0
2023-05-13 08:41:00     NaN     2.0     1.0
2023-05-13 08:42:00     -2.2    1.5     -0.3

Consider only values below 4.0 and above -4.0 in each row of the dataframe

print(xdf[((xdf<4.0).all(axis=1))&((xdf>-4.0).all(axis=1))])

Present output:

                          A      B      C
2023-05-13 08:42:00     -2.2    1.5     -0.3

Expected output: My above code drops a row if there is a NaN in one column, despite other columns satisfying the filter condition. So, I want to omit NaN columns and consider non-NaN columns in <> operation.

                          A      B      C
2023-05-13 08:41:00     NaN     2.0     1.0
2023-05-13 08:42:00     -2.2    1.5     -0.3

Edit:

One working solution:

print(xdf[((xdf.fillna(True)<4.0).all(axis=1))&((xdf.fillna(True)>-4.0).all(axis=1))])

1 Answer 1

2

I suggest add new mask DataFrame.isna chained with | (bitwise OR) for test missing values:

print(xdf[((xdf<4.0) & (xdf>-4.0) | xdf.isna()).all(axis=1)])
                       A    B    C
2023-05-13 08:41:00  NaN  2.0  1.0
2023-05-13 08:42:00 -2.2  1.5 -0.3

How it working:

print ((xdf<4.0) & (xdf>-4.0))
                         A      B     C
2023-05-13 08:40:00  False  False  True
2023-05-13 08:41:00  False   True  True
2023-05-13 08:42:00   True   True  True

print (((xdf<4.0) & (xdf>-4.0) | xdf.isna()))
                         A     B     C
2023-05-13 08:40:00  False  True  True
2023-05-13 08:41:00   True  True  True
2023-05-13 08:42:00   True  True  True

print(((xdf<4.0) & (xdf>-4.0) | xdf.isna()).all(axis=1))
2023-05-13 08:40:00    False
2023-05-13 08:41:00     True
2023-05-13 08:42:00     True
dtype: bool

Another idea is use DataFrame.lt and DataFrame.gt :

print(xdf[(xdf.lt(4.0) & xdf.gt(-4.0) | xdf.isna()).all(axis=1)])
                       A    B    C
2023-05-13 08:41:00  NaN  2.0  1.0
2023-05-13 08:42:00 -2.2  1.5 -0.3
Sign up to request clarification or add additional context in comments.

3 Comments

Clean solution. Also, any comment on my edited solution print(xdf[((xdf.fillna(True)<4.0).all(axis=1))&((xdf.fillna(True)>-4.0).all(axis=1))]) I just tried this out of my intuition. Could this lead to any problems?
@Mainland - there is problem xdf.fillna(True)<4.0 working like xdf.fillna(1)<4.0, it match, because 1 is between -4 and 4. So if need between -0.4 and 0.4 it not working print(xdf[((xdf.fillna(True)<0.40).all(axis=1))&((xdf.fillna(True)>-0.40).all(axis=1))])
Really appreciate for showing how my solution fails and it is not a correct approach. Otherwise, I would have just used it thinking it is working fine for me. Legend.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.