0

I am a newbie in Python, just started to learn. I am doing a sport prediciton based on scores that were before. I have 2 csv files, one is with all matches from the current year and one is filled with standings ( final results of the tournament and rankings + JUST UNIQUE OBJECTS - I mean I only have 14 rows on this). The problem comes with the standings csv that looks like this:

Squad,Rk,MP,W,D,L,GF,GA,GD,Pts,Pts/G,MP,W,D,L,GF,GA,GD,Pts,Pts/G
CFR Cluj,1,18,13,5,0,24,5,19,44,2.44,18,10,5,3,30,14,16,35,1.94

And I have this code that raises me the key error for the first line that I sampled from my csv.

def home_team_ranks_higher(row):
    home_team = row["Home"]
    visitor_team = row["Away"]
    home_rank = standings.loc[home_team]["Rk"]
    visitor_rank = standings.loc[visitor_team]["Rk"]
    return home_rank < visitor_rank

dataset["HomeTeamRanksHigher"] = dataset.apply(home_team_ranks_higher, axis = 1)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-112-d3a62e1e7d32> in <module>
      6     return home_rank < visitor_rank
      7 
----> 8 dataset["HomeTeamRanksHigher"] = dataset.apply(home_team_ranks_higher, axis = 1)
      9 
     10 #dataset["HomeTeamRanksHigher"] = 0

~\anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   7546             kwds=kwds,
   7547         )
-> 7548         return op.get_result()
   7549 
   7550     def applymap(self, func) -> "DataFrame":

~\anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self)
    178             return self.apply_raw()
    179 
--> 180         return self.apply_standard()
    181 
    182     def apply_empty_result(self):

~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    269 
    270     def apply_standard(self):
--> 271         results, res_index = self.apply_series_generator()
    272 
    273         # wrap results

~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
    298                 for i, v in enumerate(series_gen):
    299                     # ignore SettingWithCopy here in case the user mutates
--> 300                     results[i] = self.f(v)
    301                     if isinstance(results[i], ABCSeries):
    302                         # If we have a view on v, we need to make a copy because

<ipython-input-112-d3a62e1e7d32> in home_team_ranks_higher(row)
      2     home_team = row["Home"]
      3     visitor_team = row["Away"]
----> 4     home_rank = standings.loc[home_team]["Rk"]
      5     visitor_rank = standings.loc[visitor_team]["Rk"]
      6     return home_rank < visitor_rank

~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
    877 
    878             maybe_callable = com.apply_if_callable(key, self.obj)
--> 879             return self._getitem_axis(maybe_callable, axis=axis)
    880 
    881     def _is_scalar_access(self, key: Tuple):

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1108         # fall thru to straight lookup
   1109         self._validate_key(key, axis)
-> 1110         return self._get_label(key, axis=axis)
   1111 
   1112     def _get_slice_axis(self, slice_obj: slice, axis: int):

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _get_label(self, label, axis)
   1057     def _get_label(self, label, axis: int):
   1058         # GH#5667 this will fail if the label is not present in the axis.
-> 1059         return self.obj.xs(label, axis=axis)
   1060 
   1061     def _handle_lowerdim_multi_index_axis0(self, tup: Tuple):

~\anaconda3\lib\site-packages\pandas\core\generic.py in xs(self, key, axis, level, drop_level)
   3489             loc, new_index = self.index.get_loc_level(key, drop_level=drop_level)
   3490         else:
-> 3491             loc = self.index.get_loc(key)
   3492 
   3493             if isinstance(loc, np.ndarray):

~\anaconda3\lib\site-packages\pandas\core\indexes\range.py in get_loc(self, key, method, tolerance)
    356                 except ValueError as err:
    357                     raise KeyError(key) from err
--> 358             raise KeyError(key)
    359         return super().get_loc(key, method=method, tolerance=tolerance)
    360 

KeyError: 'CFR Cluj'

Note: I tried to interchange the 'Rk' and 'Squad' columns, but I could not get any result, just different errors.

What I am looking for is getting the rank of every home team / visitor team from my history of matches that are found in the final table (standings) and store them in 'home_rank' / 'visitor_rank' variables.

PS: I tried other ideas to access the rank but none of them got me any result.

Any ideas or solutions are great! Thank you :)

5
  • 1
    Guess you create a dataframe standings from a file standings.csv. I presume, that the dataframe standings can be accessed with the column names Squad, Rk, etc.: like standings['Squad']. The error however shows you try to asses a column standings['CFR Cluj'] (KeyError). Commented Apr 18, 2021 at 13:53
  • That's right, I am creating a dataframe from the csv table. How can I manage to access the rows that contains the "team" when I encounter it in the history csv?. If it's 'CFR Cluj' I would want to get it's rank and compare the rank of it the rank of the other team. Commented Apr 18, 2021 at 13:59
  • Also I tried this, but it just does not work in the end home_rank = standings[standings["Squad"] == home_team]["Rk"] visitor_rank = standings[standings["Squad"] == visitor_team]["Rk"]. I tried this when my first column was 'Rk'. Commented Apr 18, 2021 at 14:02
  • Ok. I've added the above thing and i get this TypeError: cannot convert the series to <class 'int'> . Also casted the 'home_rank' and 'visitor_rank' to int to do the comparation from the return. Commented Apr 18, 2021 at 14:09
  • 1
    It's kinda tough to help you without some sample data. In principle you have to find the index of the dataframe standings and use this to access the column standings['Rk'][index]. Where the index would be something like index= standings.index[standings['squad'] == 'CFR Cluj'].to_list()[0] - note the last [0] which uses the first matching squad name index. Commented Apr 18, 2021 at 14:15

1 Answer 1

1

The KeyError reflects, that you try to index your dataframe standings with a row value instead of a column name. You might try to access the squads rank home_rank (and similarly for visitor_rank) with

home_rank  = standings['Rk'][ standings['Squad']=='CFR Cluj' ][0]
#home_rank = standings['Rk'].loc[ standings['Squad']=='CFR Cluj' ][0]

Step by step this is equal to

boolean_indices = standings['Squad']=='CFR Cluj'
standings_ranks = standings['Rk']
home_ranks      = standings_ranks[boolean_indices] 
home_rank       = home_ranks[0]  #if unique it only contains a single value
Sign up to request clarification or add additional context in comments.

2 Comments

Just tried the idea from the comments above, it kinda works. I saw where my mistake was, If I print home_rank = standings['Rk'][ standings['Squad']=='CFR Cluj' ][0] it gives me the first rank value, which is what I am looking for. The problem is what I do with that 0 ... I'll try to find the solution. Thank you very much!
I'm coming back with an update, it works now :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.