0

I have the below dataframe.

df = pd.DataFrame({'Player': [1,1,1,1,2,2,2,3,3,3,4,5], "Team": ['X','X','X','Y','X','X','Y','X','X','Y','X','Y'],'Month': [1,1,1,2,1,1,2,2,2,3,4,5]})

Input:

    Player Team  Month
0        1    X      1
1        1    X      1
2        1    X      1
3        1    Y      2
4        2    X      1
5        2    X      1
6        2    Y      2
7        3    X      2
8        3    X      2
9        3    Y      3
10       4    X      4
11       5    Y      5

The data frame consists of Players, the team they belong to and the month. You can have multiple entries for the same player on a given month. Some players move from Team X to Team Y on a particular month, some don’t move at all and some directly join Team Y.

I am looking for the total count of people who moved from Team X to Team Y on a given month and the output should be like below. i.e the month of transition and total count of transitions. In this case, Players 1,2 moved on Month-2 and Player-3 moved on Month-3. Players 4 and 5 didn't move.

Expected Output:

   Month  Count
0      2      2
1      3      1

I am able to get this done in the below fashion.

###find all the people who moved from Team X to Y###
s1 = df.drop_duplicates(['Team','Player'])
s2 = s1.groupby('Player').size().reset_index(name='counts')
s2 = s2[s2['counts']>1]
####Tie them to the original df so that I can find the month in which they moved###
s3 = s1.groupby("Player").last().reset_index()
s4 = s3[s3['Player'].isin(s2['Player'])]
s5 = s4.groupby('Month').size().reset_index(name='Count')

I am pretty sure there is a better way than what I did here. Just looking for some help to make if more efficient.

4
  • Is it possible to have (1,X,1) and (1,Y,1) coexist in month=1? (i.e. the player change team within a month) Commented Oct 30, 2020 at 13:47
  • Yes, its possible to change within a month also Commented Oct 30, 2020 at 13:48
  • Currently you are dropping duplicates on Team and Player and excluding Month so is it safe to assume that player 1 would not go from X to Y back to X in a three month span? Commented Oct 30, 2020 at 13:52
  • Yes, you can assume that there is no possibility of going back again from Y to X Commented Oct 30, 2020 at 13:55

2 Answers 2

2

First pick out the entries which (1) changes team but (2) is not the first row of a player. And then compute the size grouped by each month.

mask = df["Team"].shift().ne(df["Team"]) & df["Player"].shift().eq(df["Player"])
out = df[mask].groupby("Month").size()

Output:

print(out)  # a Series

Month
2    2
3    1
dtype: int64

# series to dataframe (optional)
out.to_frame(name="count").reset_index()

   Month  count
0      2      2
1      3      1

Edit: the first groupby in mask is redundant so removed.

Sign up to request clarification or add additional context in comments.

2 Comments

Why ~df["Player"].shift().ne(df["Player"]) instead of df["Player"].shift().eq(df["Player"])?
Yeah thanks, that can be simplified. Incorporated into the post. I was thinking that way because shift-ne is like a phrase to locate the difference in my mindset.
1

An option is to self merge on Player, Month and check for the players that move:

s = df.drop_duplicates()

t = (s.merge(s.assign(Month=s.Month+1), on=['Player', 'Month'], how='right')
  .assign(Count=lambda x: x.Team_x.eq('Y') & x.Team_y.eq('X'))
  .groupby('Month', as_index=False)['Count'].sum()
)
print(t.loc[t['Count'] != 0])

Output:

   Month  Count
0      2      2
1      3      1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.