2

I want to perform a row by row comparison over multiple columns. I want a single series, indicating if all entries in a row (over several columns) are the same as the previous row.

Lets say I have the following dataframe

import pandas as pd
df = pd.DataFrame({'A' : [1, 1, 1, 2, 2], 
                   'B' : [2, 2, 3, 3, 3], 
                   'C' : [1, 1, 1, 2, 2]})

I can compare all the rows, of all the columns

>>> df.diff().eq(0)
       A      B      C
0  False  False  False
1   True   True   True
2   True  False   True
3  False   True  False
4   True   True   True

This gives a dataframe comparing each series individually. What I want is the comparison of all columns in one series.

I can achieve this by looping

compare_all = df.diff().eq(0)
compare_tot = compare_all[compare_all.columns[0]]
for c in compare_all.columns[1:]:
    compare_tot = compare_tot & compare_all[c]

This gives

>>> compare_tot
0    False
1     True
2    False
3    False
4     True
dtype: bool

as expected.

Is it possible to achieve this in with a one-liner, that is without the loop?

0

2 Answers 2

2
>>> (df == df.shift()).all(axis=1)
0    False
1     True
2    False
3    False
4     True
dtype: bool
Sign up to request clarification or add additional context in comments.

2 Comments

Am I right to assume that the diff()-function only works for numerical data, whilst this works for strings as well?
Correct. The diff method would throw an error if the values cannot be differenced such as the case with strings.
1

You need all

In [1306]: df.diff().eq(0).all(1)
Out[1306]:
0    False
1     True
2    False
3    False
4     True
dtype: bool

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.