How to delete rows with multiple conditions in pandas dataframe

Question

 import pandas as pd
 import numpy as np
 print df

I'm a newbie, I used pandas to process an excel file. I have a data frame like bellow

DAT_KEY      IP         DATA
01-04-19    10.0.0.1    3298329
01-04-19    10.0.0.1    0
02-04-19    10.0.0.1    3298339
02-04-19    10.0.0.1    0
01-04-19    10.0.0.2    3233233
01-04-19    10.0.0.2    0
01-04-19    10.0.0.3    0

I only want to delete the row when having same IP and DAT_KEY and DATA=0. Don't want to delete row have DATA=0, but DAT_KEY and IP unique.

My expected outcome:

DAT_KEY      IP         DATA
01-04-19    10.0.0.1    3298329
02-04-19    10.0.0.1    3298339
01-04-19    10.0.0.2    3233233
01-04-19    10.0.0.3    0

I try with drop duplicates but it not suitable with my case

df = df.drop_duplicates()

can there be duplicated DAT_KEY and IP with any other value except 0? and do you want to keep them? — anky
– anky, Commented Sep 9, 2019 at 10:05

bharatk · Accepted Answer · 2019-09-09 12:11:39Z

0

Use

groupby - function is used to split the data into groups based on some criteria.
.first() - Compute first of group values.

Ex.

df = df.groupby(['DAT_KEY','IP'],as_index=False,sort=False).first()
print(df)

O/P:

    DAT_KEY        IP     DATA
0  01-04-19  10.0.0.1  3298329
1  02-04-19  10.0.0.1  3298339
2  01-04-19  10.0.0.2  3233233
3  01-04-19  10.0.0.3        0

edited Sep 9, 2019 at 12:11

answered Sep 9, 2019 at 10:06

bharatk

4,3455 gold badges19 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

cuongNGUYEN_21818 Over a year ago

Sorry because i don't give full information. I have DAT_KEY of many days from 01-04 to 30-04. If use this df = df.groupby('IP',as_index=False).first(). Data remain to have only first row. of one day.

cuongNGUYEN_21818 Over a year ago

df = df.groupby(['IP', 'DAT_KEY']as_index=False).first() print(df) it works thanks alots

kantal · Accepted Answer · 2019-09-09 10:49:47Z

Maybe that's what you need:

    DAT_KEY        IP     DATA
0  01-04-19  10.0.0.1  3298329
1  01-04-19  10.0.0.1        0
2  02-04-19  10.0.0.1  3298339
3  02-04-19  10.0.0.1        0
4  01-04-19  10.0.0.2  3233233
5  01-04-19  10.0.0.2        0
6  01-04-19  10.0.0.3        0
7  01-04-19  10.0.0.1    99999

df.groupby(["DAT_KEY","IP"], as_index=False,sort=False).apply(lambda g: g if len(g)==1 else g[g["DATA"]!=0] ).reset_index(drop=True)                                                                                                      
Out[94]: 
    DAT_KEY        IP     DATA
0  01-04-19  10.0.0.1  3298329
1  01-04-19  10.0.0.1    99999
2  02-04-19  10.0.0.1  3298339
3  01-04-19  10.0.0.2  3233233
4  01-04-19  10.0.0.3        0

Collectives™ on Stack Overflow

How to delete rows with multiple conditions in pandas dataframe

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related