Pandas Dataframe iteration and selecting the rows based on condition - Change in Requirements

Question

I have sorted data frame as mentioned below(Input DataFrame) and I need to iterate the rows,select & retrive the rows into output data frame based on below conditions.

• Condition 1: For a given R1,R2,W - if we have two records with TYPE 'A' and 'B' a) If (amoun1& amount2) of TYPE ‘A’ is > (amoun1& amount2 )of TYPE ‘B’ we need to bring the TYPE 'A' record into the output b) If (amoun1& amount2) of TYPE ‘B’ is > (amoun1& amount2 )of TYPE ‘A’ we need to bring the TYPE 'B' record into the output c) If (amoun1& amount2) of TYPE ‘A’ is = (amoun1& amount2 )of TYPE ‘B’ we need to bring the TYPE 'A' record into the output

• Condition 2: For a given R1,R2,W - if we have only record with TYPE 'A', we need to bring the TYPE 'A' record into the output • Condition 3: For a given R1,R2,W - if we have only record with TYPE 'B', we need to bring the TYPE 'B' record into the output Input Dataframe

    R1  R2  W   TYPE    amount1 amount2
0   123 12  1   A   111 222
1   123 12  1   B   111 222
2   123 12  2   A   222 222
3   123 12  2   B   333 333
4   123 12  3   A   444 444
5   123 12  3   B   333 333
6   123 34  1   A   111 222
7   123 34  2   A   333 444
8   123 34  2   B   333 444
9   123 34  3   B   444 555
10  123 34  4   A   555 666
11  123 34  4   B   666 777

Output dataframe

    R1  R2  W   TYPE    amount1 amount1
0   123 12  1   A   111 222
3   123 12  2   B   333 333
4   123 12  3   A   444 444
6   123 34  1   A   111 222
7   123 34  2   A   333 444
9   123 34  3   B   444 555
11  123 34  4   B   666 777

R1 R2 W TYPE amount1 amount2 0 123 12 1 A 111 222 1 123 12 1 B 222 333 2 123 12 2 A 333 444 3 123 12 2 B 444 555 4 123 12 3 A 555 666 5 123 12 3 B 666 777 6 123 34 1 A 111 222 7 123 34 2 A 222 333 8 123 34 2 B 333 444 9 123 34 3 B 444 555 10 123 34 4 A 555 666 11 123 34 4 B 666 777 — ram k
– ram k, Commented Mar 7, 2020 at 10:55

DOOM · Accepted Answer · 2020-03-07 11:19:20Z

1

Selection based on your criteria's

def my_selection(idf):
  # If 'A' and 'B' in 'TYPE' then give me the row with 'A'
  if idf['TYPE'].unique().shape[0] == 2:
    return idf[idf['TYPE'] == 'A']
  else:
    return idf

df2 = df.groupby(['R1', 'R2', 'W'], as_index=False).apply(lambda idf: my_selection(idf))
df2.index = df2.index.droplevels(-1)

#     R1  R2  W TYPE  amount1  amount2
# 0  123  12  1    A      111      222
# 1  123  12  2    A      333      444
# 2  123  12  3    A      555      666
# 3  123  34  1    A      111      222
# 4  123  34  2    A      222      333
# 5  123  34  3    B      444      555
# 6  123  34  4    A      555      666

answered Mar 7, 2020 at 11:19

DOOM

1,2547 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ram k Over a year ago

Thanks for your prompt response.It is working on test set as expected.Let me check on actual data and let you know feedback

Raghul Raj · Accepted Answer · 2020-03-08 20:11:01Z

1

All you have to do is groupby R1,R2,W and operate on Type column as follows:

data.groupby(['R1','R2','W']).apply(lambda x: 'A' if 'A' in x['Type'].values else 'B').reset_index()

You can merge this output with original DataFrame on the obtained columns from the above output to get corresponding 'amount1', 'amount2' values

edited Mar 8, 2020 at 20:11

answered Mar 7, 2020 at 11:20

Raghul Raj

1,44811 silver badges24 bronze badges

9 Comments

ram k Over a year ago

Hi Raghul Raj, I have tried this and able to see that all the output records are mapped to B and not able to see any records with Type A. I am further analysing this

Raghul Raj Over a year ago

Hey @ram k, Sorry I did not try it on a data before. I've updated my answer. Earlier I was checking for 'A' in series hence it was always throwing false. Now I'm checking it on the converted array. It's working as expected now

ram k Over a year ago

Hi Raghul Raj, I have validate the code with test set and actual set(1.5 million rows) and it is working as expected and it is taking close to 20 to 30 min which is okay in my case.Thanks for your prompt response and support.

ram k Over a year ago

Hi Raghul Raj,There is change in my requirement and will add you in another comment along with the details.Kindly let me know your feedback

Raghul Raj Over a year ago

Hi ram k what's the change?

|

anky · Accepted Answer · 2020-03-07 12:56:41Z

0

This is what I would do:

categories =  ['B','A'] #create a list of categories in ascending order of precedence
d={i:e for e,i in enumerate(categories)} #create a dictionary:{'A': 0, 'B': 1}
s=df['TYPE'].map(d) #map to df['TYPE'] and create a helper series

then assign this series to the dataframe and groupby+transform max and check if it is equal to the helper series and return where both value matches:

out = df[s.eq(df.assign(TYPE=s).groupby(['R1','R2','W'])['TYPE'].transform('max'))]
print(out)

     R1  R2  W TYPE  amount1  amount2
0   123  12  1    A      111      222
2   123  12  2    A      333      444
4   123  12  3    A      555      666
6   123  34  1    A      111      222
7   123  34  2    A      222      333
9   123  34  3    B      444      555
10  123  34  4    A      555      666

answered Mar 7, 2020 at 12:56

anky

75.3k11 gold badges46 silver badges76 bronze badges

7 Comments

ram k Over a year ago

Thanks for your prompt response.It is working on test set as expected.But when i tried with full data set(1.5 million records), taking long time to run .i monitored till 1.5 hr and post to that i have cancelled the job.Let me check and get back to you asap

anky Over a year ago

@ramk this should not be slower than loops.. but do let me know

ram k Over a year ago

My sorrys.it is working as expected.please advice how can I accpet

anky Over a year ago

@ramk you can take a look here: meta.stackexchange.com/questions/5234/… :)

ram k Over a year ago

Done.is it reflecting

|

Collectives™ on Stack Overflow

Pandas Dataframe iteration and selecting the rows based on condition - Change in Requirements

3 Answers 3

1 Comment

9 Comments

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

9 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related