Pandas: calculate specific columns

Question

I have following simplified data frame from an excel file

                   Team                   match1                game12                  match3
1            Sandhausen                   2                     3                       1
2              Pohlheim                   1                     1                       6
3            Völklingen                   4                     2                       4
4  Nieder-Olm/Wörrstadt                   5                     7                       2
5             Nümbrecht                   7                     6                       3
6               Dorheim                   3                     4                       7
7        Nienburg/Weser                   6                     5                       5
8           Bad Homburg                   8                     8                       8
9           Bad Homburg                   9                     9                       9

I would like to calculate the best team in total. The data on the match represent the place of the team. To calculate the best team the 1. place get 9 points the 2. place get 8 points and so on. This for all matches.

My problem is that the match1 could be a complete different name is it possible to work with indexes?

Update I use both answers:

to create something like this:

count_row = df.shape[0]

df["score"] = (count_row+1 - df.drop(columns='Team')).sum(axis=1)
df['extra_points'] = (df ==1).sum(axis=1)
df['total'] = df.loc[:,['score','extra_points']].sum(axis=1)
df_total = df.groupby("Team").agg({"total": "sum"}).reset_index().sort_values(by='total', ascending=False)

print(df)

print(df_total)

the matches have different names like football, basketball and so on. Bad Homburg have two teams — Joel Klein
– Joel Klein, Commented Jan 19, 2020 at 19:58

MkWTF · Accepted Answer · 2020-01-19 20:03:20Z

1

You can also do it like this:

df = pd.DataFrame([
        ['Sandhausen',2,3,1],
        ['Pohlheim',1,1,6],
        ['Völklingen',4,2,4],
        ['Nieder-Olm/Wörrstadt',5,7,2],
        ['Nümbrecht',7,6,3],
        ['Dorheim',3,4,7],
        ['Nienburg/Weser',6,5,5],
        ['Bad Homburg',8,8,8],
        ['Bad Homburg',9,9,9]
    ],
    columns=["Team", "match1", "game12", "match2"])

df["score"] = ( 10 - df.drop(columns=["Team"]) ).sum(axis=1)

Basically in here I'm selecting all the columns that should be considered for the score (in this case, all except the column Team) [ df.drop(columns=["Team"]) ].

Then, I'm converting the rank into the score ( rank 1 -> 10 - 1 = 9, rank 2 -> 10 - 2 = 8, ..., rank 9 -> 10 - 9 = 1 ) [ ( 10 - ... ) ].

After this, I sum all the values over the rows (axis=1) and assign it to the column score [ df["score"] = (...).sum(axis=1) ].

This results in the following:

                   Team  match1  game12  match2  score
0            Sandhausen       2       3       1     24
1              Pohlheim       1       1       6     22
2            Völklingen       4       2       4     20
3  Nieder-Olm/Wörrstadt       5       7       2     16
4             Nümbrecht       7       6       3     14
5               Dorheim       3       4       7     16
6        Nienburg/Weser       6       5       5     14
7           Bad Homburg       8       8       8      6
8           Bad Homburg       9       9       9      3

Also, if you prefer to select the columns you want to use instead of droping, you can use something like this:

df[[ col for col in df.columns if col != "Team" ]]

The filtering is happening in col != "Team", but you can change it.

answered Jan 19, 2020 at 20:03

MkWTF

1,3827 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Joel Klein Over a year ago

this works perfect. Is it possible to give the first place 9+1 points and all other 10-place number? Is it possible to sum the duplicates like on the solution from ansev?

MkWTF Over a year ago

I couldn't understand your scoring technique, give me an example pls. And for aggregating the duplicates you can do df.groupby("Team").agg({"score": "sum"}).reset_index() after calculating the score (the last reset_index function is to place the Team as a column instead a index, which might me useful or not for you).

MkWTF Over a year ago

Or, if you want to keep score, and the sum of duplicates, you can do it with this:df["total_score"] = df.groupby("Team")["score"].transform("sum"). Check out the different results in here.

Joel Klein Over a year ago

The scoring is 90% right but the first place gets an extra point this mean Sandhausen 10 Pohlheim 8 Völklingen 7 and so on... Ahhh I see the sum is much simpler than I thought. :)

Joel Klein Over a year ago

I think I found a solution :) repl.it/repls/AbandonedJudiciousVisitor

|

ansev · Accepted Answer · 2020-01-19 20:04:03Z

UPDATED , to calculate the best team per column:

df.set_index('Team').idxmax()
match1    BadHomburg
game12    BadHomburg
match3    BadHomburg
dtype: object

If there are duplicates Team in column Team and you want sum I will use DataFrame.melt with groupby.sum:

df_ranking = ( df.melt('Team')
                 .groupby('Team')['value']
                 .sum()
                 .sort_values(ascending = False)
                 .to_frame('Points')
                 .reset_index() )

df_ranking.index = df_ranking.index + 1

print(df_ranking)
                   Team  Points
1            BadHomburg    42.0
2             Nümbrecht    16.0
3        Nienburg/Weser    16.0
4  Nieder-Olm/Wörrstadt    14.0
5               Dorheim    14.0
6            Völklingen    10.0
7              Pohlheim     8.0
8            Sandhausen     6.0

Checking the best Team

df_ranking.loc[1,'Team']
#'BadHomburg'

Collectives™ on Stack Overflow

Pandas: calculate specific columns

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related