38
hsp.loc[hsp['Len_old'] == hsp['Len_new']]

I try this code, it's working.

But I tried these three

hsp.loc[hsp['Type_old'] == hsp['Type_new']] 
hsp.loc[hsp['Type_old'] != hsp['Type_new']] 
hsp.loc[hsp['Len_old'] != hsp['Len_new']] 

They are not working.

My data table hsp is like

id  Type_old  Type_new  Len_old  Len_new
1    Num       Num       15       15
2    Num       Char      12       12
3    Char      Num       10       8
4    Num       Num       4        5
5    Char      Char      9        10

Is there a better approach to select rows where two columns are not queal.

3
  • 2
    What do you mean by "not working"? Whats' the expected output? What's the actual output? I ran the commands you mentioned and they are working I expect them to. Commented Jul 11, 2017 at 17:02
  • it is working on my side Commented Jul 11, 2017 at 17:09
  • Feel free to up-vote my answer as well if you found it useful. Commented Jul 11, 2017 at 18:42

4 Answers 4

43

Use the complement operator ~

hsp.loc[~(hsp['Type_old'] == hsp['Type_new'])]

which gives:

   id Type_old Type_new  Len_old  Len_new
1   2      Num     Char       12       12
2   3     Char      Num       10        8

When dealing with Boolean operations, the complement operator is a handy way to invert True with False

Sign up to request clarification or add additional context in comments.

Comments

19

Ways to be confused by == versus != when comparing pd.Series

As expected

df[['Len_old', 'Len_new']].assign(NE=df.Len_old != df.Len_new)

   Len_old  Len_new     NE
0       15       15  False
1       12       12  False
2       10        8   True
3        4        5   True
4        9       10   True

But if one of the column's values were strings!

df[['Len_old', 'Len_new']].assign(NE=df.Len_old.astype(str) != df.Len_new)

   Len_old  Len_new    NE
0       15       15  True
1       12       12  True
2       10        8  True
3        4        5  True
4        9       10  True

Make sure both are the same types.

Comments

4

Your code, as piRSquared said, had an issue with types.

Besides that, you could use comparing methods, in this case pd.Series.ne

Using your data:

hsp.loc[hsp['Type_old'].ne(hsp['Type_new'])]

But again, as piRSquared mentioned, because of dtypes it didn't work. Just in case, you have to take care about NaN/None values at your data... such:

hsp.loc[ ( hsp['Type_old'].ne(hsp['Type_new']) ) && (hsp['Type_old'].notna())]

In this case, .ne has another argument, fill_value, which fill missing data.


In addition, you could use "compare" method to show difference between two series (or DataFrames)

hsp.Len_old.compare(hsp.Len_new)

And it might return (if columns were of the same dtype):

   self  other
2  10.0    8.0
3   4.0    5.0
4   9.0   10.0

But just force to have another dtype:

hsp.Len_old.compare(hsp.Len_new.astype('str')) # string type new column

It will return all rows:

   self other
0   15  15
1   12  12
2   10  8
3   4   5
4   9   10

1 Comment

Pandas Version: 1.2.2
2

You can also use query here:

In [7]: hsp.query('Type_old != Type_new')
Out[7]: 
   id Type_old Type_new  Len_old  Len_new
1   2      Num     Char       12       12
2   3     Char      Num       10        8

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.