2

Kind similar to this question: Pandas merge removing duplicate rows

I am using Python pandas -

Input:

df = pd.DataFrame({
               'type':['a','b','c','d','e'],
               'value':[100,200,300,400,500]})

I want to self-join this list:

df_merge = pd.merge(df, df,on=['type'])

But I want only want to keep rows below:

type_x  value_x type_y  value_y
a       100      b      200
a       100      c      300
a       100      d      400
a       100      e      500
b       200      c      300
b       200      d      400
b       200      e      500
c       300      d      400
c       300      e      500
d       400      e      500

How can I do this in Pandas? Thank you for the help!

2
  • I don't think the outer join does what you are expecting. Have you tried running the code you posted? The rows you describe should not appear in df_merge. The output you want looks more like the result of itertools.combinations. Commented Feb 16, 2021 at 19:08
  • Agree - any thoughts on how to use itertools? Thanks! Commented Feb 16, 2021 at 19:32

2 Answers 2

2

There's no need for pandas.merge() here. Just feed the combinations output into the DataFrame constructor (with a little sleight of hand to turn two 2-tuples into a list with 4 elements):

from itertools import combinations
import pandas

types = ['a','b','c','d','e']
values = [100,200,300,400,500]

rows = [[*pair1, *pair2] 
        for pair1, pair2 in combinations(zip(types, values), 2)]

columns = [f"{col}_{var}" 
           for col in ['type', 'value'] 
           for var in ['x', 'y']]

pandas.DataFrame(rows, columns=columns) 
Sign up to request clarification or add additional context in comments.

Comments

1

Ugly, but gets the job done:

import pandas as pd
df = pd.DataFrame({
               'type':['a','b','c','d','e'],
               'value':[100,200,300,400,500]})

import itertools
combinations = pd.DataFrame(list(itertools.combinations(df['type'], 2)), columns=['type_x', 'type_y'])

combinations = pd.merge(combinations, df, left_on='type_x', right_on='type') \
                    .drop(columns=['type']) \
                    .rename(columns={'value': 'value_x'})
combinations = pd.merge(combinations, df, left_on='type_y', right_on='type') \
                    .drop(columns=['type']) \
                    .rename(columns={'value': 'value_y'})
combinations
type_x type_y value_x value_y
0 a b 100 200
1 a c 100 300
2 b c 200 300
3 a d 100 400
4 b d 200 400
5 c d 300 400
6 a e 100 500
7 b e 200 500
8 c e 300 500
9 d e 400 500

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.