0

I have a pandas dataframe as shown below:

Pandas Dataframe

I want to drop the rows that has only one non zero value. What's the most efficient way to do this?

3 Answers 3

1

Try boolean indexing

# sample data
df = pd.DataFrame(np.zeros((10, 10)), columns=list('abcdefghij'))
df.iloc[2:5, 3] = 1
df.iloc[4:5, 4] = 1

# boolean indexing based on condition
df[df.ne(0).sum(axis=1).ne(1)]

Only rows 2 and 3 are removed because row 4 has two non-zero values and every other row has zero non-zero values. So we drop rows 2 and 3.

df.ne(0).sum(axis=1)

0    0
1    0
2    1
3    1
4    2
5    0
6    0
7    0
8    0
9    0
Sign up to request clarification or add additional context in comments.

Comments

0

Not sure if this is the most efficient but I'll try:

df[[col for col in df.columns if (df[col] != 0).sum() == 1]]

2 loops per column here: 1 for checking if != 0 and one more to sum the boolean values up (could break earlier if the second value is found).

Otherwise, you can define a custom function to check without looping twice per column:

def check(column):
    already_has_one = False
    for value in column:
        if value != 0:
            if already_has_one:
                return False
            already_has_one = True
    return already_has_one

then:

df[[col for col in df.columns if check(df[col])]]

Which is much faster than the first.

1 Comment

See the answer by @It_is_Chris, which runs faster by utilizing .ne() despite having 2 full loops per column.
0

Or like this:

df[(df.applymap(lambda x: bool(x)).sum(1) > 1).values]

1 Comment

I think you meant sum() without the axis since we are filtering out the columns here, and also should be != 1 instead of >1 since this drops the 0 counts.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.