0

In Pandas, I have a dataframe with ZipCode, Age, and a bunch of columns that should all have values 1 or 0, ie:

ZipCode Age A B C D

12345   21  0 1 1 1
12345   22  1 0 1 4
23456   45  1 0 1 1
23456   21  3 1 0 0

I want to delete all rows in which 0 or 1 doesn't appear in columns A,B,C, or D as a way to clean up the data. In this case, I would remove the 2nd and 4th row because 4 appears in column D in row 2 and 3 appears in column A in row 4. I want to do this even if I have 100 columns to check such that I don't have to look up every column one by one in my conditional statement. How would I do this?

3 Answers 3

2

Use isin to test for membership and all to test if all row values are True and use this boolean mask to filter the df:

In [12]:
df[df.ix[:,'A':].isin([0,1]).all(axis=1)]

Out[12]:
   ZipCode  Age  A  B  C  D
0    12345   21  0  1  1  1
2    23456   45  1  0  1  1
Sign up to request clarification or add additional context in comments.

Comments

2

You can opt for a vectorized solution:

In [64]: df[df[['A','B','C','D']].isin([0,1]).sum(axis=1)==4]

Out[64]:
   ZipCode  Age  A  B  C  D
0    12345   21  0  1  1  1
2    23456   45  1  0  1  1

Comments

0

Other two solutions works well but if you interested in speed you should look at numpy in1d function:

data=df.loc[:, 'A':]

In [72]:  df[np.in1d(data.values,[0,1]).reshape(data.shape).all(axis=1)]
Out[72]:
   ZipCode  Age  A  B  C  D
0    12345   21  0  1  1  1
2    23456   45  1  0  1  1

Timing:

In [73]: %timeit data=df.loc[:, 'A':]; df[np.in1d(data.values,[0,1]).reshape(data.shape).all(axis=1)]
1000 loops, best of 3: 558 us per loop

In [74]: %timeit df[df.ix[:,'A':].isin([0,1]).all(axis=1)]
1000 loops, best of 3: 843 us per loop

In [75]: %timeit df[df[['A','B','C','D']].isin([0,1]).sum(axis=1)==4]
1000 loops, best of 3: 1.44 ms per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.