Python - Pandas remove rows after self-join (merge)

Question

Kind similar to this question: Pandas merge removing duplicate rows

I am using Python pandas -

Input:

df = pd.DataFrame({
               'type':['a','b','c','d','e'],
               'value':[100,200,300,400,500]})

I want to self-join this list:

df_merge = pd.merge(df, df,on=['type'])

But I want only want to keep rows below:

type_x  value_x type_y  value_y
a       100      b      200
a       100      c      300
a       100      d      400
a       100      e      500
b       200      c      300
b       200      d      400
b       200      e      500
c       300      d      400
c       300      e      500
d       400      e      500

How can I do this in Pandas? Thank you for the help!

I don't think the outer join does what you are expecting. Have you tried running the code you posted? The rows you describe should not appear in df_merge. The output you want looks more like the result of itertools.combinations. — Marmaduke
– Marmaduke, Commented Feb 16, 2021 at 19:08

Marmaduke · Accepted Answer · 2021-02-17 20:56:33Z

2

There's no need for pandas.merge() here. Just feed the combinations output into the DataFrame constructor (with a little sleight of hand to turn two 2-tuples into a list with 4 elements):

from itertools import combinations
import pandas

types = ['a','b','c','d','e']
values = [100,200,300,400,500]

rows = [[*pair1, *pair2] 
        for pair1, pair2 in combinations(zip(types, values), 2)]

columns = [f"{col}_{var}" 
           for col in ['type', 'value'] 
           for var in ['x', 'y']]

pandas.DataFrame(rows, columns=columns)

answered Feb 17, 2021 at 20:56

Marmaduke

5814 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Octav · Accepted Answer · 2021-02-16 20:05:12Z

Ugly, but gets the job done:

import pandas as pd
df = pd.DataFrame({
               'type':['a','b','c','d','e'],
               'value':[100,200,300,400,500]})

import itertools
combinations = pd.DataFrame(list(itertools.combinations(df['type'], 2)), columns=['type_x', 'type_y'])

combinations = pd.merge(combinations, df, left_on='type_x', right_on='type') \
                    .drop(columns=['type']) \
                    .rename(columns={'value': 'value_x'})
combinations = pd.merge(combinations, df, left_on='type_y', right_on='type') \
                    .drop(columns=['type']) \
                    .rename(columns={'value': 'value_y'})
combinations

	type_x	type_y	value_x	value_y
0	a	b	100	200
1	a	c	100	300
2	b	c	200	300
3	a	d	100	400
4	b	d	200	400
5	c	d	300	400
6	a	e	100	500
7	b	e	200	500
8	c	e	300	500
9	d	e	400	500

Collectives™ on Stack Overflow

Python - Pandas remove rows after self-join (merge)

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

	type_x	type_y	value_x	value_y
0	a	b	100	200
1	a	c	100	300
2	b	c	200	300
3	a	d	100	400
4	b	d	200	400
5	c	d	300	400
6	a	e	100	500
7	b	e	200	500
8	c	e	300	500
9	d	e	400	500

	type_x	type_y	value_x	value_y
0	a	b	100	200
1	a	c	100	300
2	b	c	200	300
3	a	d	100	400
4	b	d	200	400
5	c	d	300	400
6	a	e	100	500
7	b	e	200	500
8	c	e	300	500
9	d	e	400	500

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

	type_x	type_y	value_x	value_y
0	a	b	100	200
1	a	c	100	300
2	b	c	200	300
3	a	d	100	400
4	b	d	200	400
5	c	d	300	400
6	a	e	100	500
7	b	e	200	500
8	c	e	300	500
9	d	e	400	500