Pandas: using shift to dataframe

Question

I have dataframe

id     event_path
111    google.com
111    yandex.ru
111     vk.com
222     twitter.com
222     twitter.com
333     twitter.com
333     facebook.com

Desire output

id     event_path
111    google.com
111    yandex.ru
111     vk.com
222     twitter.com
333     twitter.com
333     facebook.com

I try to use shift to column

df.loc[(df.event_path != df.event_path.shift()) & \
       (df.id == df.id.shift())]

and it returns me

id     event_path
111    google.com
111    yandex.ru
111     vk.com
222     twitter.com
333     facebook.com

How can I fix that?

What are you trying to achieve here, dropping duplicates or consecutive duplicates? If the latter then this is a dupe of this: stackoverflow.com/questions/19463985/… — EdChum
– EdChum, Commented Nov 16, 2017 at 10:47
@EdChum I need to get data like 111 -> google.com, yandex.ru, vk.com; 222 -> twitter.com; 333 -> twitter.com -> facebook.com . I need union duplicate urls in the path of user — Petr Petrov
– Petr Petrov, Commented Nov 16, 2017 at 10:49
Your question is unclear, can you post a better explanation in your question as everyone is confused here — EdChum
– EdChum, Commented Nov 16, 2017 at 10:51

piRSquared · Accepted Answer · 2017-11-16 10:49:04Z

3

Use pd.DataFrame.drop_duplicates

df.drop_duplicates()

    id    event_path
0  111    google.com
1  111     yandex.ru
2  111        vk.com
3  222   twitter.com
5  333   twitter.com
6  333  facebook.com

IIUC: OP wants to remove only when duplicate is adjacent.

df[df.eq(df.shift().bfill()).any(1)]

    id    event_path
0  111    google.com
1  111     yandex.ru
2  111        vk.com
4  222   twitter.com
5  333   twitter.com
6  333  facebook.com

edited Nov 16, 2017 at 10:49

answered Nov 16, 2017 at 10:37

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Petr Petrov Over a year ago

thank you! But can you show me a decision with shift() ? Sometimes there are some columns, that I don't should drop duplicates

piRSquared Over a year ago

Construct example input and desired output that demonstrates what you're asking.

piRSquared Over a year ago

I've updated my post. Beyond that, you'll have to put in some extra work explaining what you want.

jezrael · Accepted Answer · 2017-11-29 14:58:35Z

1

You can create helper Series for consecutives values with shift, add column id and get duplicated. Last filter out by boolean indexing:

df1=df[~df[['id']].join(df['event_path'].ne(df['event_path'].shift()).cumsum()).duplicated()]

answered Nov 29, 2017 at 14:58

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

Pandas: using shift to dataframe

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related