Python Dataframe avoid non-NaN values dropping during <> operations

Question

My code:

xdf = pd.DataFrame(data={'A':[-10,np.nan,-2.2],'B':[np.nan,2,1.5],'C':[3,1,-0.3]},index=['2023-05-13 08:40:00','2023-05-13 08:41:00','2023-05-13 08:42:00'])
xdf = 
                          A      B      C
2023-05-13 08:40:00     -10.0   NaN     3.0
2023-05-13 08:41:00     NaN     2.0     1.0
2023-05-13 08:42:00     -2.2    1.5     -0.3

Consider only values below 4.0 and above -4.0 in each row of the dataframe

print(xdf[((xdf<4.0).all(axis=1))&((xdf>-4.0).all(axis=1))])

Present output:

                          A      B      C
2023-05-13 08:42:00     -2.2    1.5     -0.3

Expected output: My above code drops a row if there is a NaN in one column, despite other columns satisfying the filter condition. So, I want to omit NaN columns and consider non-NaN columns in <> operation.

                          A      B      C
2023-05-13 08:41:00     NaN     2.0     1.0
2023-05-13 08:42:00     -2.2    1.5     -0.3

Edit:

One working solution:

print(xdf[((xdf.fillna(True)<4.0).all(axis=1))&((xdf.fillna(True)>-4.0).all(axis=1))])

jezrael · Accepted Answer · 2024-05-21 07:31:45Z

2

I suggest add new mask DataFrame.isna chained with | (bitwise OR) for test missing values:

print(xdf[((xdf<4.0) & (xdf>-4.0) | xdf.isna()).all(axis=1)])
                       A    B    C
2023-05-13 08:41:00  NaN  2.0  1.0
2023-05-13 08:42:00 -2.2  1.5 -0.3

How it working:

print ((xdf<4.0) & (xdf>-4.0))
                         A      B     C
2023-05-13 08:40:00  False  False  True
2023-05-13 08:41:00  False   True  True
2023-05-13 08:42:00   True   True  True

print (((xdf<4.0) & (xdf>-4.0) | xdf.isna()))
                         A     B     C
2023-05-13 08:40:00  False  True  True
2023-05-13 08:41:00   True  True  True
2023-05-13 08:42:00   True  True  True

print(((xdf<4.0) & (xdf>-4.0) | xdf.isna()).all(axis=1))
2023-05-13 08:40:00    False
2023-05-13 08:41:00     True
2023-05-13 08:42:00     True
dtype: bool

Another idea is use DataFrame.lt and DataFrame.gt :

print(xdf[(xdf.lt(4.0) & xdf.gt(-4.0) | xdf.isna()).all(axis=1)])
                       A    B    C
2023-05-13 08:41:00  NaN  2.0  1.0
2023-05-13 08:42:00 -2.2  1.5 -0.3

edited May 21, 2024 at 7:31

answered May 21, 2024 at 7:26

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mainland Over a year ago

Clean solution. Also, any comment on my edited solution print(xdf[((xdf.fillna(True)<4.0).all(axis=1))&((xdf.fillna(True)>-4.0).all(axis=1))]) I just tried this out of my intuition. Could this lead to any problems?

jezrael Over a year ago

@Mainland - there is problem xdf.fillna(True)<4.0 working like xdf.fillna(1)<4.0, it match, because 1 is between -4 and 4. So if need between -0.4 and 0.4 it not working print(xdf[((xdf.fillna(True)<0.40).all(axis=1))&((xdf.fillna(True)>-0.40).all(axis=1))])

Mainland Over a year ago

Really appreciate for showing how my solution fails and it is not a correct approach. Otherwise, I would have just used it thinking it is working fine for me. Legend.

Collectives™ on Stack Overflow

Python Dataframe avoid non-NaN values dropping during <> operations

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related