1

I took a whole afternoon trying to implement this task but failed ,I've got a pandas data frame like this

columns=[ka,kb_1,kb_2,timeofEvent,timeInterval]
0:'3M' '2345' '2345' '2014-10-5',3000
1:'3M' '2958' '2152' '2015-3-22',5000
2:'GE' '2183' '2183' '2012-12-31',515
3:'3M' '2958' '2958' '2015-3-10',395
4:'GE' '2183' '2285' '2015-4-19',1925
5:'GE' '2598' '2598' '2015-3-17',1915

What is to be implemented is a new data frame grouped by "ka and kb_1" below

columns=[ka,kb,errorNum,errorRate,totalNum of records]
'3M','2345',0,0%,1
'3M','2958',1,50%,2
'GE','2183',1,50%,2
'GE','2598',0,0%,1

(definition of error Record: when kb_1!=kb_2,the corresponding record is treated as abnormal record)

My code is like this

df['isError'] = (df['kb_1'] != df['kb_2']).astype('int')
grouped2 = df.groupby(['ka', 'kb_1'])

df_rst = pd.DataFrame()
df_rst['ka']  =grouped2['ka'].all()
df_rst['kb_1'] = grouped2['kb_1'].all()
df_rst['errorNum'] = grouped2['isError'].transform(sum)
df_rst['totalNum of records'] = grouped2.size()
df_rst['Soll_neq_Letzt_error_rate'] = df_rst['errorNum'].astype('float').div(df_rst['totalNum'].astype('float'), axis='index')
df_rst.to_csv('rst.csv',index=False)

but the result is not what I wanted.

For instance, the column kb_1 becomes true/false, and errorNum becomes Nan. Can anyone explain why and give an workable implementation? Thanks

2
  • Please show sample data and desired results. Saying the results are not what you wanted doesn't really tell us what you do want. Commented May 14, 2015 at 17:14
  • Thanks, I have just added the input and desired output Commented May 14, 2015 at 17:45

1 Answer 1

1

I'm not sure exactly what you did, but I don't think you were that far off.

df2 = df.groupby(['ka','kb_1'])['isError'].agg({ 'errorNum':  'sum',
                                                 'recordNum': 'count' })

df2['errorRate'] = df2['errorNum'] / df2['recordNum']

         recordNum  errorNum  errorRate
ka kb_1                                
3M 2345          1         0        0.0
   2958          2         1        0.5
GE 2183          2         1        0.5
   2598          1         0        0.0
Sign up to request clarification or add additional context in comments.

3 Comments

wow,thanks,can't image it just takes two lines to complete, as I was still thinking of joining the table.
Glad that helps, please remember to click the checkmark if you are reasonably happy with the answer.
done, I met another difficulty in achieving this because the csv I try to manipulate too big to put into memory, see stackoverflow.com/questions/30245640/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.