1

I have dataframe

id     event_path
111    google.com
111    yandex.ru
111     vk.com
222     twitter.com
222     twitter.com
333     twitter.com
333     facebook.com

Desire output

id     event_path
111    google.com
111    yandex.ru
111     vk.com
222     twitter.com
333     twitter.com
333     facebook.com

I try to use shift to column

df.loc[(df.event_path != df.event_path.shift()) & \
       (df.id == df.id.shift())]

and it returns me

id     event_path
111    google.com
111    yandex.ru
111     vk.com
222     twitter.com
333     facebook.com

How can I fix that?

3
  • What are you trying to achieve here, dropping duplicates or consecutive duplicates? If the latter then this is a dupe of this: stackoverflow.com/questions/19463985/… Commented Nov 16, 2017 at 10:47
  • @EdChum I need to get data like 111 -> google.com, yandex.ru, vk.com; 222 -> twitter.com; 333 -> twitter.com -> facebook.com . I need union duplicate urls in the path of user Commented Nov 16, 2017 at 10:49
  • Your question is unclear, can you post a better explanation in your question as everyone is confused here Commented Nov 16, 2017 at 10:51

2 Answers 2

3

Use pd.DataFrame.drop_duplicates

df.drop_duplicates()

    id    event_path
0  111    google.com
1  111     yandex.ru
2  111        vk.com
3  222   twitter.com
5  333   twitter.com
6  333  facebook.com

IIUC: OP wants to remove only when duplicate is adjacent.

df[df.eq(df.shift().bfill()).any(1)]

    id    event_path
0  111    google.com
1  111     yandex.ru
2  111        vk.com
4  222   twitter.com
5  333   twitter.com
6  333  facebook.com
Sign up to request clarification or add additional context in comments.

3 Comments

thank you! But can you show me a decision with shift() ? Sometimes there are some columns, that I don't should drop duplicates
Construct example input and desired output that demonstrates what you're asking.
I've updated my post. Beyond that, you'll have to put in some extra work explaining what you want.
1

You can create helper Series for consecutives values with shift, add column id and get duplicated. Last filter out by boolean indexing:

df1=df[~df[['id']].join(df['event_path'].ne(df['event_path'].shift()).cumsum()).duplicated()]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.