More efficient way to write this lambda function

Question

import pandas as pd

prizes = ([1, 100], [2, 50], [3, 25])
prizes = pd.DataFrame(prizes, columns=['Rank', 'Payout'])

ranking = ([1, 3, 2], [2, 2, 1], [3, 1, 3])
ranking = pd.DataFrame(ranking, columns=[1, 2, 3])

payouts = pd.DataFrame(range(1, 4), columns=['Lineup'])
mapper = prizes.set_index('Rank')['Payout'].to_dict()
payouts = pd.concat([payouts, ranking[range(1, 4)].apply(lambda s: s.map(mapper)).fillna(-1)], axis=1)

print(ranking)
print(payouts)

   1  2  3
0  1  3  2
1  2  2  1
2  3  1  3
   Lineup    1    2    3
0       1  100   25   50
1       2   50   50  100
2       3   25  100   25

The lambda function that is just above the print statements, is there any way to write that more efficiently. This is just a small example of what I'm using it for inside a large loop. This one portion of the loop takes roughly about half of the time of the entire loop. Any help would be appreciated.

sammywemmy · Accepted Answer · 2021-09-14 03:12:51Z

2

You don't need to create a dict for mapper, setting the index and ensuring it is a Series suffices (a Series is a dict in a way); on to your question, you can use replace instead; it should be faster:

mapper = prizes.set_index('Rank')['Payout']

pd.concat([payouts, ranking.replace(mapper)], axis=1)

   Lineup    1    2    3
0       1  100   25   50
1       2   50   50  100
2       3   25  100   25

Your example doesn't show the need for a fillna; you can add extra details to your data for such a scenario. Also, since payouts is just a single column, you could instead create a Series, some performance gain may be had from there

answered Sep 14, 2021 at 3:12

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Scott Woodhall Over a year ago

For some reason when I put this into my code, it doesn't replace the ranking values with the values from mapper. Could it be because the length of mapper doesn't match the length of each column in ranking? All it does is spit back out the original rankings.

sammywemmy Over a year ago

Did you assign the new values to ranking?

Scott Woodhall Over a year ago

I apologize I'm not sure what you mean.

mozway · Accepted Answer · 2021-09-14 03:50:30Z

1

Here is an even faster (but less concise) solution using the underlying numpy array. There is a ~1.7x gain compared to replace.

a = prizes.set_index('Rank')['Payout'].values
b = ranking.values-1 # get index as 0/1/2
c = a.take(b.flatten()).reshape(b.shape) # index in 1D and reshape to 2D
pd.DataFrame(c, columns=ranking.columns)

NB. I broke the steps down for clarity, but this could be done without the intermediate variables

Output:

     1    2    3
0  100   25   50
1   50   50  100
2   25  100   25

answered Sep 14, 2021 at 3:50

mozway

267k13 gold badges56 silver badges106 bronze badges

2 Comments

Scott Woodhall Over a year ago

I end up getting "TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'" with this string of code.

mozway Over a year ago

This means you probably have float values in ranking, make sure the ranks are integers

Collectives™ on Stack Overflow

More efficient way to write this lambda function

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related