I have a pandas df where each row is a list of words. The list has duplicate words. I want to remove duplicate words.
I tried using dict.fromkeys(listname) in a for loop to iterate over each row in the df. But this splits the words into alphabets
filepath = "C:/abc5/Python/Clustering/output2.csv"
df = pd.read_csv(filepath,encoding='windows-1252')
df["newlist"] = df["text_lemmatized"]
for i in range(0,len(df)):
l = df["text_lemmatized"][i]
df["newlist"][i] = list(dict.fromkeys(l))
print(df)
Expected result is ==>
['clear', 'pending', 'order', 'pending', 'order'] ['clear', 'pending', 'order']
['pending', 'activation', 'clear', 'pending'] ['pending', 'activation', 'clear']
Actual result is
['clear', 'pending', 'order', 'pending', 'order'] ... [[, ', c, l, e, a, r, ,, , p, n, d, i, g, o, ]]
['pending', 'activation', 'clear', 'pending', ... ... [[, ', p, e, n, d, i, g, ,, , a, c, t, v, o, ...
