There are several columns in the data, three are named "candidate_id", "enddate", "TitleLevel".
Within the same id, if the enddate is the same, I will delete the lower level record.
For example, given:
candidate_id startdate enddate TitleLevel
1 2012.1.1 2013.5.1 2
1 2011.1.1 2013.5.1 4
1 2008.12.1 2010.1.1 3
2 2010.10.1 2012.12.1 2
What I want is:
candidate_id startdate enddate TitleLevel
1 2011.1.1 2013.5.1 4
1 2008.12.1 2010.1.1 3
2 2010.10.1 2012.12.1 2
I will delete candidate_id=1, enddate=2013.5.1, and titlelevel=2.
I have come up with a loop.
for i in range(nrow-2,-1, -1):
if (JobData['enddate'][i] == JobData['enddate'][i+1]
and JobData['candidate_id'][i] == JobData['candidate_id'][i+1]
and pd.notnull(JobData['enddate'][i]):
if JobData['TitleLevel'][i] > JobData['TitleLevel'][i+1]:
JobData= JobData.drop(i+1)
else:
JobData= JobData.drop(i)
The loop really takes some time to delete redundant rows. Is there a faster method?
pandasis very helpful, because folks familiar with it will see your question