Pandas Concat and remove all duplicate row values

Question

I have a DataFrame in Pandas that looks like this

data 1  data 2  data 3 
swag     swag    swag
yo       swag    hey
hey      yo      yo

I want to concatenate these columns into on and remove any duplicate data in the rows being removed.

It'd print out like so (Since there are three swags in the first row and they are duplicates, it have only one swag, then the next row it has yo and swag)

data (column name)
swag
yo
swag
hey
hey
yo

Duplicate of both How to compress or stack a pandas dataframe along the rows? and how do I remove rows with duplicate values of columns in pandas data frame? — esqew
– esqew, Commented Oct 3, 2022 at 21:03

bitflip · Accepted Answer · 2022-10-03 21:17:03Z

0

Do you care about the order of your values? If yes:

df.apply(lambda x: dict.fromkeys(x), axis=1).explode()
0    swag
1      yo
1    swag
1     hey
2     hey
2      yo
dtype: object

If not:

list(map(set, df.values)) 
[{'swag'}, {'swag', 'hey', 'yo'}, {'hey', 'yo'}]

is faster.

answered Oct 3, 2022 at 21:17

bitflip

3,7391 gold badge6 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Timeless · Accepted Answer · 2022-10-03 21:21:19Z

0

You can use pandas.DataFrame.stack :

out = (
        df.stack()
          .reset_index()
          .drop_duplicates(subset=['level_0', 0], keep='first')
          .rename(columns= {0: 'data'})
          .drop(columns=['level_0', 'level_1'])
          .reset_index(drop=True)
       )

# Output :

print(out)

   data
0  swag
1    yo
2  swag
3   hey
4   hey
5    yo

answered Oct 3, 2022 at 21:21

Timeless

38.3k6 gold badges33 silver badges54 bronze badges

Collectives™ on Stack Overflow

Pandas Concat and remove all duplicate row values

2 Answers 2

Comments

# Output :

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

# Output :

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related