0

I have following simplified data frame from an excel file

                   Team                   match1                game12                  match3
1            Sandhausen                   2                     3                       1
2              Pohlheim                   1                     1                       6
3            Völklingen                   4                     2                       4
4  Nieder-Olm/Wörrstadt                   5                     7                       2
5             Nümbrecht                   7                     6                       3
6               Dorheim                   3                     4                       7
7        Nienburg/Weser                   6                     5                       5
8           Bad Homburg                   8                     8                       8
9           Bad Homburg                   9                     9                       9

I would like to calculate the best team in total. The data on the match represent the place of the team. To calculate the best team the 1. place get 9 points the 2. place get 8 points and so on. This for all matches.

My problem is that the match1 could be a complete different name is it possible to work with indexes?

Update I use both answers:

to create something like this:

count_row = df.shape[0]

df["score"] = (count_row+1 - df.drop(columns='Team')).sum(axis=1)
df['extra_points'] = (df ==1).sum(axis=1)
df['total'] = df.loc[:,['score','extra_points']].sum(axis=1)
df_total = df.groupby("Team").agg({"total": "sum"}).reset_index().sort_values(by='total', ascending=False)

print(df)

print(df_total)

6
  • 1
    what is game12 ? Commented Jan 19, 2020 at 19:40
  • why Bad Homburg is dup? Commented Jan 19, 2020 at 19:44
  • the matches have different names like football, basketball and so on. Bad Homburg have two teams Commented Jan 19, 2020 at 19:58
  • 1
    could you show your expected output for this dataframe? Commented Jan 19, 2020 at 19:58
  • Do you want to calculate the best team per column? Commented Jan 19, 2020 at 20:02

2 Answers 2

1

You can also do it like this:

df = pd.DataFrame([
        ['Sandhausen',2,3,1],
        ['Pohlheim',1,1,6],
        ['Völklingen',4,2,4],
        ['Nieder-Olm/Wörrstadt',5,7,2],
        ['Nümbrecht',7,6,3],
        ['Dorheim',3,4,7],
        ['Nienburg/Weser',6,5,5],
        ['Bad Homburg',8,8,8],
        ['Bad Homburg',9,9,9]
    ],
    columns=["Team", "match1", "game12", "match2"])

df["score"] = ( 10 - df.drop(columns=["Team"]) ).sum(axis=1)

Basically in here I'm selecting all the columns that should be considered for the score (in this case, all except the column Team) [ df.drop(columns=["Team"]) ].

Then, I'm converting the rank into the score ( rank 1 -> 10 - 1 = 9, rank 2 -> 10 - 2 = 8, ..., rank 9 -> 10 - 9 = 1 ) [ ( 10 - ... ) ].

After this, I sum all the values over the rows (axis=1) and assign it to the column score [ df["score"] = (...).sum(axis=1) ].

This results in the following:

                   Team  match1  game12  match2  score
0            Sandhausen       2       3       1     24
1              Pohlheim       1       1       6     22
2            Völklingen       4       2       4     20
3  Nieder-Olm/Wörrstadt       5       7       2     16
4             Nümbrecht       7       6       3     14
5               Dorheim       3       4       7     16
6        Nienburg/Weser       6       5       5     14
7           Bad Homburg       8       8       8      6
8           Bad Homburg       9       9       9      3

Also, if you prefer to select the columns you want to use instead of droping, you can use something like this:

df[[ col for col in df.columns if col != "Team" ]]

The filtering is happening in col != "Team", but you can change it.

Sign up to request clarification or add additional context in comments.

6 Comments

this works perfect. Is it possible to give the first place 9+1 points and all other 10-place number? Is it possible to sum the duplicates like on the solution from ansev?
I couldn't understand your scoring technique, give me an example pls. And for aggregating the duplicates you can do df.groupby("Team").agg({"score": "sum"}).reset_index() after calculating the score (the last reset_index function is to place the Team as a column instead a index, which might me useful or not for you).
Or, if you want to keep score, and the sum of duplicates, you can do it with this:df["total_score"] = df.groupby("Team")["score"].transform("sum"). Check out the different results in here.
The scoring is 90% right but the first place gets an extra point this mean Sandhausen 10 Pohlheim 8 Völklingen 7 and so on... Ahhh I see the sum is much simpler than I thought. :)
I think I found a solution :) repl.it/repls/AbandonedJudiciousVisitor
|
1

UPDATED , to calculate the best team per column:

df.set_index('Team').idxmax()
match1    BadHomburg
game12    BadHomburg
match3    BadHomburg
dtype: object

If there are duplicates Team in column Team and you want sum I will use DataFrame.melt with groupby.sum:

df_ranking = ( df.melt('Team')
                 .groupby('Team')['value']
                 .sum()
                 .sort_values(ascending = False)
                 .to_frame('Points')
                 .reset_index() )

df_ranking.index = df_ranking.index + 1

print(df_ranking)
                   Team  Points
1            BadHomburg    42.0
2             Nümbrecht    16.0
3        Nienburg/Weser    16.0
4  Nieder-Olm/Wörrstadt    14.0
5               Dorheim    14.0
6            Völklingen    10.0
7              Pohlheim     8.0
8            Sandhausen     6.0

Checking the best Team

df_ranking.loc[1,'Team']
#'BadHomburg'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.