1

I have a DataFrame in Pandas that looks like this

data 1  data 2  data 3 
swag     swag    swag
yo       swag    hey
hey      yo      yo 

I want to concatenate these columns into on and remove any duplicate data in the rows being removed.

It'd print out like so (Since there are three swags in the first row and they are duplicates, it have only one swag, then the next row it has yo and swag)

data (column name)
swag
yo
swag
hey
hey
yo
1

2 Answers 2

0

Do you care about the order of your values? If yes:

df.apply(lambda x: dict.fromkeys(x), axis=1).explode()
0    swag
1      yo
1    swag
1     hey
2     hey
2      yo
dtype: object

If not:

list(map(set, df.values)) 
[{'swag'}, {'swag', 'hey', 'yo'}, {'hey', 'yo'}]

is faster.

Sign up to request clarification or add additional context in comments.

Comments

0

You can use pandas.DataFrame.stack :

out = (
        df.stack()
          .reset_index()
          .drop_duplicates(subset=['level_0', 0], keep='first')
          .rename(columns= {0: 'data'})
          .drop(columns=['level_0', 'level_1'])
          .reset_index(drop=True)
       )

# Output :

print(out)

   data
0  swag
1    yo
2  swag
3   hey
4   hey
5    yo

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.