2
 import pandas as pd
 import numpy as np
 print df

I'm a newbie, I used pandas to process an excel file. I have a data frame like bellow

DAT_KEY      IP         DATA
01-04-19    10.0.0.1    3298329
01-04-19    10.0.0.1    0
02-04-19    10.0.0.1    3298339
02-04-19    10.0.0.1    0
01-04-19    10.0.0.2    3233233
01-04-19    10.0.0.2    0
01-04-19    10.0.0.3    0

I only want to delete the row when having same IP and DAT_KEY and DATA=0. Don't want to delete row have DATA=0, but DAT_KEY and IP unique.

My expected outcome:

DAT_KEY      IP         DATA
01-04-19    10.0.0.1    3298329
02-04-19    10.0.0.1    3298339
01-04-19    10.0.0.2    3233233
01-04-19    10.0.0.3    0

I try with drop duplicates but it not suitable with my case

df = df.drop_duplicates()
1
  • 2
    can there be duplicated DAT_KEY and IP with any other value except 0? and do you want to keep them? Commented Sep 9, 2019 at 10:05

2 Answers 2

0

Use

  • groupby - function is used to split the data into groups based on some criteria.
  • .first() - Compute first of group values.

Ex.

df = df.groupby(['DAT_KEY','IP'],as_index=False,sort=False).first()
print(df)

O/P:

    DAT_KEY        IP     DATA
0  01-04-19  10.0.0.1  3298329
1  02-04-19  10.0.0.1  3298339
2  01-04-19  10.0.0.2  3233233
3  01-04-19  10.0.0.3        0
Sign up to request clarification or add additional context in comments.

2 Comments

Sorry because i don't give full information. I have DAT_KEY of many days from 01-04 to 30-04. If use this df = df.groupby('IP',as_index=False).first(). Data remain to have only first row. of one day.
df = df.groupby(['IP', 'DAT_KEY']as_index=False).first() print(df) it works thanks alots
0

Maybe that's what you need:

    DAT_KEY        IP     DATA
0  01-04-19  10.0.0.1  3298329
1  01-04-19  10.0.0.1        0
2  02-04-19  10.0.0.1  3298339
3  02-04-19  10.0.0.1        0
4  01-04-19  10.0.0.2  3233233
5  01-04-19  10.0.0.2        0
6  01-04-19  10.0.0.3        0
7  01-04-19  10.0.0.1    99999

df.groupby(["DAT_KEY","IP"], as_index=False,sort=False).apply(lambda g: g if len(g)==1 else g[g["DATA"]!=0] ).reset_index(drop=True)                                                                                                      
Out[94]: 
    DAT_KEY        IP     DATA
0  01-04-19  10.0.0.1  3298329
1  01-04-19  10.0.0.1    99999
2  02-04-19  10.0.0.1  3298339
3  01-04-19  10.0.0.2  3233233
4  01-04-19  10.0.0.3        0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.