1

I have sorted data frame as mentioned below(Input DataFrame) and I need to iterate the rows,select & retrive the rows into output data frame based on below conditions.

• Condition 1: For a given R1,R2,W - if we have two records with TYPE 'A' and 'B' a) If (amoun1& amount2) of TYPE ‘A’ is > (amoun1& amount2 )of TYPE ‘B’ we need to bring the TYPE 'A' record into the output b) If (amoun1& amount2) of TYPE ‘B’ is > (amoun1& amount2 )of TYPE ‘A’ we need to bring the TYPE 'B' record into the output c) If (amoun1& amount2) of TYPE ‘A’ is = (amoun1& amount2 )of TYPE ‘B’ we need to bring the TYPE 'A' record into the output

• Condition 2: For a given R1,R2,W - if we have only record with TYPE 'A', we need to bring the TYPE 'A' record into the output • Condition 3: For a given R1,R2,W - if we have only record with TYPE 'B', we need to bring the TYPE 'B' record into the output Input Dataframe

    R1  R2  W   TYPE    amount1 amount2
0   123 12  1   A   111 222
1   123 12  1   B   111 222
2   123 12  2   A   222 222
3   123 12  2   B   333 333
4   123 12  3   A   444 444
5   123 12  3   B   333 333
6   123 34  1   A   111 222
7   123 34  2   A   333 444
8   123 34  2   B   333 444
9   123 34  3   B   444 555
10  123 34  4   A   555 666
11  123 34  4   B   666 777

Output dataframe

    R1  R2  W   TYPE    amount1 amount1
0   123 12  1   A   111 222
3   123 12  2   B   333 333
4   123 12  3   A   444 444
6   123 34  1   A   111 222
7   123 34  2   A   333 444
9   123 34  3   B   444 555
11  123 34  4   B   666 777
2
  • Input Data frame Commented Mar 7, 2020 at 10:54
  • R1 R2 W TYPE amount1 amount2 0 123 12 1 A 111 222 1 123 12 1 B 222 333 2 123 12 2 A 333 444 3 123 12 2 B 444 555 4 123 12 3 A 555 666 5 123 12 3 B 666 777 6 123 34 1 A 111 222 7 123 34 2 A 222 333 8 123 34 2 B 333 444 9 123 34 3 B 444 555 10 123 34 4 A 555 666 11 123 34 4 B 666 777 Commented Mar 7, 2020 at 10:55

3 Answers 3

1

Selection based on your criteria's

def my_selection(idf):
  # If 'A' and 'B' in 'TYPE' then give me the row with 'A'
  if idf['TYPE'].unique().shape[0] == 2:
    return idf[idf['TYPE'] == 'A']
  else:
    return idf

df2 = df.groupby(['R1', 'R2', 'W'], as_index=False).apply(lambda idf: my_selection(idf))
df2.index = df2.index.droplevels(-1)

#     R1  R2  W TYPE  amount1  amount2
# 0  123  12  1    A      111      222
# 1  123  12  2    A      333      444
# 2  123  12  3    A      555      666
# 3  123  34  1    A      111      222
# 4  123  34  2    A      222      333
# 5  123  34  3    B      444      555
# 6  123  34  4    A      555      666

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your prompt response.It is working on test set as expected.Let me check on actual data and let you know feedback
1

All you have to do is groupby R1,R2,W and operate on Type column as follows:

data.groupby(['R1','R2','W']).apply(lambda x: 'A' if 'A' in x['Type'].values else 'B').reset_index() 

You can merge this output with original DataFrame on the obtained columns from the above output to get corresponding 'amount1', 'amount2' values

9 Comments

Hi Raghul Raj, I have tried this and able to see that all the output records are mapped to B and not able to see any records with Type A. I am further analysing this
Hey @ram k, Sorry I did not try it on a data before. I've updated my answer. Earlier I was checking for 'A' in series hence it was always throwing false. Now I'm checking it on the converted array. It's working as expected now
Hi Raghul Raj, I have validate the code with test set and actual set(1.5 million rows) and it is working as expected and it is taking close to 20 to 30 min which is okay in my case.Thanks for your prompt response and support.
Hi Raghul Raj,There is change in my requirement and will add you in another comment along with the details.Kindly let me know your feedback
Hi ram k what's the change?
|
0

This is what I would do:

categories =  ['B','A'] #create a list of categories in ascending order of precedence
d={i:e for e,i in enumerate(categories)} #create a dictionary:{'A': 0, 'B': 1}
s=df['TYPE'].map(d) #map to df['TYPE'] and create a helper series

then assign this series to the dataframe and groupby+transform max and check if it is equal to the helper series and return where both value matches:

out = df[s.eq(df.assign(TYPE=s).groupby(['R1','R2','W'])['TYPE'].transform('max'))]
print(out)

     R1  R2  W TYPE  amount1  amount2
0   123  12  1    A      111      222
2   123  12  2    A      333      444
4   123  12  3    A      555      666
6   123  34  1    A      111      222
7   123  34  2    A      222      333
9   123  34  3    B      444      555
10  123  34  4    A      555      666

7 Comments

Thanks for your prompt response.It is working on test set as expected.But when i tried with full data set(1.5 million records), taking long time to run .i monitored till 1.5 hr and post to that i have cancelled the job.Let me check and get back to you asap
@ramk this should not be slower than loops.. but do let me know
My sorrys.it is working as expected.please advice how can I accpet
@ramk you can take a look here: meta.stackexchange.com/questions/5234/… :)
Done.is it reflecting
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.