Python Pandas: Efficiently assign values to a slice [duplicate]

Question

I have a dataframe next_train with weekly data for many players (80,000 players observed through 4 weeks, total of 320,000 observations) and a dictionary players containing a binary variable for some of the players (say 10,000). I want to add this binary variable to the dataframe next_train (if a player is not in the dictionary players, I set the variable equal to zero). This is how I'm doing it:

next_train = pd.read_csv()
# ... calculate dictionary 'players' ...
next_train['variable'] = 0
for player in players:
    next_train.loc[next_train['id_of_player'] == player, 'variable'] = players[player]

However the for loop takes ages to complete, and I don't understand why. It looks like the task is to perform binary search for the value player in my dataframe for 10,000 times (size of the players dictionary), but the execution time is several minutes. Is there any efficient way to do this task?

ysearka · Accepted Answer · 2018-08-16 08:15:57Z

1

You should use map instead of slicing, that will be way faster:

next_train['variable'] = next_train.id_of_player.map(players)

As you want 0 in the other rows, you can then use fillna:

next_train.variable.fillna(0,inplace = True)

Moreover, if your dictionnary only contains boolean values, you might want to redefine the type of variable column to take less space. So you end with this piece of code:

next_train['variable'] = next_train.id_of_player.map(players).fillna(0).astype(int)

answered Aug 16, 2018 at 8:15

ysearka

3,8655 gold badges24 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Shaido · Accepted Answer · 2018-08-16 08:16:37Z

1

Use map and fillna:

next_train['variable'] = next_train['id_of_player'].map(players).fillna(0)

This creates a new column by applying the dictionary on the player ids and then fills all empty values with 0.

answered Aug 16, 2018 at 8:16

Shaido

28.6k26 gold badges76 silver badges82 bronze badges

Collectives™ on Stack Overflow

Python Pandas: Efficiently assign values to a slice [duplicate]

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related