dropping rows that has only one non zero value from a pandas dataframe in python

Question

I have a pandas dataframe as shown below:

I want to drop the rows that has only one non zero value. What's the most efficient way to do this?

It_is_Chris · Accepted Answer · 2022-04-14 15:28:21Z

1

Try boolean indexing

# sample data
df = pd.DataFrame(np.zeros((10, 10)), columns=list('abcdefghij'))
df.iloc[2:5, 3] = 1
df.iloc[4:5, 4] = 1

# boolean indexing based on condition
df[df.ne(0).sum(axis=1).ne(1)]

Only rows 2 and 3 are removed because row 4 has two non-zero values and every other row has zero non-zero values. So we drop rows 2 and 3.

df.ne(0).sum(axis=1)

0    0
1    0
2    1
3    1
4    2
5    0
6    0
7    0
8    0
9    0

edited Apr 14, 2022 at 15:28

answered Apr 14, 2022 at 15:21

It_is_Chris

14.2k3 gold badges27 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

rcshon · Accepted Answer · 2022-04-14 15:23:14Z

0

Not sure if this is the most efficient but I'll try:

df[[col for col in df.columns if (df[col] != 0).sum() == 1]]

2 loops per column here: 1 for checking if != 0 and one more to sum the boolean values up (could break earlier if the second value is found).

Otherwise, you can define a custom function to check without looping twice per column:

def check(column):
    already_has_one = False
    for value in column:
        if value != 0:
            if already_has_one:
                return False
            already_has_one = True
    return already_has_one

then:

df[[col for col in df.columns if check(df[col])]]

Which is much faster than the first.

answered Apr 14, 2022 at 15:23

rcshon

9271 gold badge7 silver badges12 bronze badges

1 Comment

rcshon Over a year ago

See the answer by @It_is_Chris, which runs faster by utilizing .ne() despite having 2 full loops per column.

lmielke · Accepted Answer · 2022-04-14 15:31:50Z

0

Or like this:

df[(df.applymap(lambda x: bool(x)).sum(1) > 1).values]

answered Apr 14, 2022 at 15:31

lmielke

1258 bronze badges

1 Comment

rcshon Over a year ago

I think you meant sum() without the axis since we are filtering out the columns here, and also should be != 1 instead of >1 since this drops the 0 counts.

Collectives™ on Stack Overflow

dropping rows that has only one non zero value from a pandas dataframe in python

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related