pandas concatenate duplicate rows based on single column value

Question

I'm trying to remove duplicates values and blanks in my dataframe, and then reorder all the values so that in a row, all column values end with same number:

THIS IS MY CURRENT DATAFRAME:

    brand   code   des   price   year
0  brand1  code1  des1  price1  year1
1  brand2  code2        price2       
2  brand3  code3  des3  price3  year3
3  brand4  code4        price4       
4  brand5  code5  des5  price5  year5
5  brand6  code6        price6       
6          code2  des2          year2
7          code4  des4          year4
8          code6  des6          year6

THIS IS WHAT I WANT AS OUTPUT:

    brand   code   des   price   year
0  brand1  code1  des1  price1  year1
1  brand2  code2  des2  price2  year2
2  brand3  code3  des3  price3  year3
3  brand4  code4  des4  price4  year4
4  brand5  code5  des5  price5  year5
5  brand6  code6  des6  price6  year6

This is the code I wrote, if someone can guide me how can I do it, that would be really appreciated:

import pandas as pd

data = {
'code': ['code1','code2','code3','code4','code5','code6','code2','code4','code6'],
'des': ['des1','','des3','','des5','','des2','des4','des6'],
'price': ['price1','price2','price3','price4','price5','price6','','',''],
'year': ['year1','','year3','','year5','','year2','year4','year6'],
'brand': ['brand1','brand2','brand3','brand4','brand5','brand6','','','']
}


df = pd.DataFrame.from_dict(data)
print(df)

SeaBean · Accepted Answer · 2021-06-15 12:14:11Z

3

You can use df.apply() on each column and then for each column series, use np.unique() to get the sorted unique item list (with empty string skipped) and then use pd.Series to recreate the columns.

import numpy as np

df.apply(lambda x: pd.Series(np.unique(x[x!=''])))

Output:

    code   des   price   year   brand
0  code1  des1  price1  year1  brand1
1  code2  des2  price2  year2  brand2
2  code3  des3  price3  year3  brand3
3  code4  des4  price4  year4  brand4
4  code5  des5  price5  year5  brand5
5  code6  des6  price6  year6  brand6

edited Jun 15, 2021 at 12:14

answered Jun 15, 2021 at 12:11

SeaBean

23.4k3 gold badges16 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

XXavier · Accepted Answer · 2021-06-15 12:04:34Z

0

Is this what you are looking for? First fill the empty space with np.nan then drop the na rows using apply

df = df.replace(r'^\s*$', np.nan, regex=True)
df.apply(lambda x: pd.Series(x.dropna().values))

code    des     price   year    brand
0   code1   des1    price1  year1   brand1
1   code2   des3    price2  year3   brand2
2   code3   des5    price3  year5   brand3
3   code4   des2    price4  year2   brand4
4   code5   des4    price5  year4   brand5
5   code6   des6    price6  year6   brand6
6   code2   NaN     NaN     NaN     NaN
7   code4   NaN     NaN     NaN     NaN
8   code6   NaN     NaN     NaN     NaN

answered Jun 15, 2021 at 12:04

XXavier

1,2011 gold badge10 silver badges15 bronze badges

Collectives™ on Stack Overflow

pandas concatenate duplicate rows based on single column value

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related