0

I have a csv file. I need to remove the duplicates values under street_name. ex: I have multi hwy-1w! enter image description here

I used this query: joinedResult.groupby('roadId')['street_name'].apply(', '.join).reset_index().to_csv(f'./2{areaId}.csv', index = False)

1
  • 1
    Can you please provide a minimal reproducible example so others can reproduce this problem. Screenshots cannot be reproduced and one can't see a thing on this one anyway. Assuming you have them in a column of a dataframe you can use df["street_name"].unique(). See pandas docs Commented Mar 23, 2022 at 21:42

1 Answer 1

0

If you want unique per row, this question might be of help. If you want to keep the data in the row and don't care about order of the string in the row after, maybe this could help:

df['street_name'] = df['street_name'].apply(lambda x: ', '.join(set(x.split(', '))

Converting to sets is always a nice way to remove duplicates.

If you need to preserve order, you can use a Counter. It will be slower than using sets though:

from collections import Counter
df['street_name'] = df['street_name'].apply(lambda x: ', '.join(Counter(x.split(', ')).keys()))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.