Filter rows in csv based on custom condition

Question

Say i have a csv like below

+-----+-----------+---------+
| ID  | state     | city    |
+-----+-----------+---------+
| 101 | READY     |         |
| 101 | DELIVERED | NEWYORK |
| 101 | DELIVERED | LONDON  |   
| 102 | READY     |         |
| 102 | DELIVERED | LONDON  |
| 103 | READY     |         |
| 103 | DELIVERED | NEWYORK |
| 104 | READY     |         |
| 104 | DELIVERED | TOKYO   |
| 104 | DELIVERED | PARIS   |
| 105 | DELIVERED | NEWYORK |
+-----+-----------+---------+

Now I want ID with State READY which has DELIVERED as NEWYORK.

Same ID will come multiple times with different state and city.
Always the city for READY will be empty
Always the city for DELIVERED will have some values.

so here first i want to check the value of column city for DELIVERED state. If it is NEWYORK then take that ID's READY row. If there is no READY row for that then we can ignore (ID 105 in this example)

EXPECTED OUTPUT

+-----+-----------+---------+
| ID  | state     | city    |
+-----+-----------+---------+
| 101 | READY     |         |
| 103 | READY     |         |
+-----+-----------+---------+

I have tried using self join in pandas. But i have no idea how to proceed further as i'm new to python. Currently i'm doing this in SQL.

import pandas as pd
mydata = pd.read_csv('C:/Mypython/Newyork',encoding = "ISO-8859-1")
NY = pd.merge(mydata,mydata,left_on='ID',right_on='ID',how='inner')

Quang Hoang · Accepted Answer · 2020-08-03 15:24:43Z

3

Let's try groupby().transform() to identify those with NEWYORK and then boolean indexing:

has_NY = df['city'].eq('NEWYORK').groupby(df['ID']).transform('any')

mask = df['state'].eq('READY') & has_NY

df[mask]

Output:

    ID  state  city
0  101  READY  None
5  103  READY  None

answered Aug 3, 2020 at 15:24

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BallpointBen · Accepted Answer · 2020-08-03 15:28:54Z

2

Use the NEWYORK condition to get a list of IDs, then use that list to filter.

new_york_ids = df.loc[df['city']=='NEWYORK', 'ID']
df[(df['state']=='READY') & (df['ID'].isin(new_york_ids))]

    ID  state  city
0  101  READY  None
5  103  READY  None

answered Aug 3, 2020 at 15:28

BallpointBen

15.6k2 gold badges46 silver badges81 bronze badges

Collectives™ on Stack Overflow

Filter rows in csv based on custom condition

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related