I have following column in a dataframe which contains colors seprated by |
df = pd.DataFrame({'x': ['RED|BROWN|YELLOW', 'WHITE|BLACK|YELLOW|GREEN', 'BLUE|RED|PINK']})
I want to find all unique colors from the column.
Expected Output:
{'YELLOW', 'BLACK', 'RED', 'BLUE', 'BROWN', 'GREEN', 'WHITE', 'PINK'}
I don't mind if it is list or set.
What I tried:
df['x'] = df['x'].apply(lambda x: x.split("|"))
colors = []
for idx, row in df.iterrows():
colors.extend(row['x'])
print(set(colors))
Which is working fine but I am looking for more efficient solution as I have large dataset.
listorset