1

I have the following Pandas DataFrame in Python:

import numpy as np
import pandas as pd
df  = pd.DataFrame(np.array([[1, 2, 3, 4, 5, 6], [11, 22, 33, 44, 55, 66], 
                             [111, 222, 0, 0, 0, 0], [1111, 0, 0, 0, 0, 0]]),
                   columns=['a', 'b', 'c', 'd', 'e', 'f'])

DataFrame looks as the following in a table:

    a      b     c    d     e    f
0   1      2     3    4     5    6
1   11     22    33   44    55   66
2   111    222   0    0     0    0
3   1111   2222  0    0     0    0

The original DataFrame is much bigger than this. As seen, some rows have zero values in some columns (c, d, e, f).

I need to remove these columns from the DataFrame so that my new DataFrame will look as the following (after removing rows where given columns are zeros only):

    a      b     c    d     e    f
0   1      2     3    4     5    6
1   11     22    33   44    55   66

And I only need to remove the rows where all these column (c, d, e, and f) are zeros. If, for example, 2 of them are 0, then I will not remove such rows.

Is there a good way of doing this operation without looping through the DataFrame?

4 Answers 4

3

Row filtering on selected columns, any have zeroes with any:

import numpy as np
import pandas as pd

df  = pd.DataFrame(np.array([[1, 2, 3, 4, 5, 6], [11, 22, 33, 44, 55, 66],
                             [111, 222, 0, 0, 0, 0], [1111, 0, 0, 0, 0, 0]]),
                   columns=['a', 'b', 'c', 'd', 'e', 'f'])

df = df[(df[['c', 'd', 'e', 'f']] != 0).any(axis=1)]

print(df)

Output:

    a   b   c   d   e   f
0   1   2   3   4   5   6
1  11  22  33  44  55  66
Sign up to request clarification or add additional context in comments.

2 Comments

thank you for your answer. The solution only needs to remove rows where all given columns are zeros. If, for example, 3 of them are zeros, then the code should not touch such rows. I tried your solution by changing the 3rd row to [111, 222, 333, 0, 0, 0] but it removed this row as well but it needed to have it left in the table.
@edn Fixed just now.
2

with operators

df.loc[~((((df['c'] == 0) & (df['d'] == 0)) & (df['e'] == 0)) & (df['f'] == 0))]

Comments

1

try this,

df[~df[list('cdef')].eq(0).all(axis = 1)]

    a   b   c   d   e   f
0   1   2   3   4   5   6
1  11  22  33  44  55  66

3 Comments

thank you for your answer. This code removes any row where any column is zero. It is ok if one or two columns are zero. But in my case, all c, d, e, f columns needs to be zero. Updated the question now.
@edn, boolean mask on c,d,e,f followed by all with axis=1 to ensure all values are true.
Thank you @Sushanth for your answer.
1

Here is one more option: Use df.query() with an self defined query.

my_query = '~('+'and '.join([f'{name}==0' for name in 'cdef'])+')'
df.query(my_query)

If you print my_query it is easy to read: ~(c==0 and d==0 and e==0 and f==0) with ~ means 'not'.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.