Say i have a csv like below
+-----+-----------+---------+
| ID | state | city |
+-----+-----------+---------+
| 101 | READY | |
| 101 | DELIVERED | NEWYORK |
| 101 | DELIVERED | LONDON |
| 102 | READY | |
| 102 | DELIVERED | LONDON |
| 103 | READY | |
| 103 | DELIVERED | NEWYORK |
| 104 | READY | |
| 104 | DELIVERED | TOKYO |
| 104 | DELIVERED | PARIS |
| 105 | DELIVERED | NEWYORK |
+-----+-----------+---------+
Now I want ID with State READY which has DELIVERED as NEWYORK.
- Same ID will come multiple times with different state and city.
- Always the
cityfor READY will be empty - Always the
cityfor DELIVERED will have some values.
so here first i want to check the value of column city for DELIVERED state. If it is NEWYORK then take that ID's READY row. If there is no READY row for that then we can ignore (ID 105 in this example)
EXPECTED OUTPUT
+-----+-----------+---------+
| ID | state | city |
+-----+-----------+---------+
| 101 | READY | |
| 103 | READY | |
+-----+-----------+---------+
I have tried using self join in pandas. But i have no idea how to proceed further as i'm new to python. Currently i'm doing this in SQL.
import pandas as pd
mydata = pd.read_csv('C:/Mypython/Newyork',encoding = "ISO-8859-1")
NY = pd.merge(mydata,mydata,left_on='ID',right_on='ID',how='inner')