1

Consider this dataframe.

df = pd.DataFrame(data={'one': list('abcd'),
                        'two': list('efgh'),
                        'three': list('ajha')})
  one two three
0   a   e     a
1   b   f     j
2   c   g     h
3   d   h     a

How can I output all duplicate values and their respective index? The output can look something like this.

  id value
0  2     h
1  3     h
2  0     a
3  0     a
4  3     a

3 Answers 3

2

Try .melt + .duplicated:

x = df.reset_index().melt("index")
print(
    x.loc[x.duplicated(["value"], keep=False), ["index", "value"]]
    .reset_index(drop=True)
    .rename(columns={"index": "id"})
)

Prints:

   id value
0   0     a
1   3     h
2   0     a
3   2     h
4   3     a
Sign up to request clarification or add additional context in comments.

2 Comments

Many replied quickly but I appreciate the simplicity of your approach. In fact, makes me feel silly for not thinking of this. Thanks!
@user2962397 Thanks, happy coding! :)
2

We can stack the DataFrame, use Series.loc to keep only where value is Series.duplicated then Series.reset_index to convert to a DataFrame:

new_df = (
    df.stack()  # Convert to Long Form
        .droplevel(-1).rename_axis('id')  # Handle MultiIndex
        .loc[lambda x: x.duplicated(keep=False)]  # Filter Values
        .reset_index(name='value')  # Make Series a DataFrame
)

new_df:

   id value
0   0     a
1   0     a
2   2     h
3   3     h
4   3     a

Comments

2

I used here melt to reshape and duplicated(keep=False) to select the duplicates:

(df.rename_axis('id')
   .reset_index()
   .melt(id_vars='id')
   .loc[lambda d: d['value'].duplicated(keep=False), ['id','value']]
   .sort_values(by='id')
   .reset_index(drop=True)
 )

Output:

    id value
0   0     a
1   0     a
2   2     h
3   3     h
4   3     a

2 Comments

I believe loc with lambda would have less overhead than assign + query + drop. df.reset_index().melt(id_vars='index').loc[lambda d: d['value'].duplicated(keep=False), ['index','value']] (although that would be almost identical to Andrej's answer)
Yes you're right, I just didn't think of it at the time ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.