Python pandas KeyError

Question

I am a newbie in Python, just started to learn. I am doing a sport prediciton based on scores that were before. I have 2 csv files, one is with all matches from the current year and one is filled with standings ( final results of the tournament and rankings + JUST UNIQUE OBJECTS - I mean I only have 14 rows on this). The problem comes with the standings csv that looks like this:

Squad,Rk,MP,W,D,L,GF,GA,GD,Pts,Pts/G,MP,W,D,L,GF,GA,GD,Pts,Pts/G
CFR Cluj,1,18,13,5,0,24,5,19,44,2.44,18,10,5,3,30,14,16,35,1.94

And I have this code that raises me the key error for the first line that I sampled from my csv.

def home_team_ranks_higher(row):
    home_team = row["Home"]
    visitor_team = row["Away"]
    home_rank = standings.loc[home_team]["Rk"]
    visitor_rank = standings.loc[visitor_team]["Rk"]
    return home_rank < visitor_rank

dataset["HomeTeamRanksHigher"] = dataset.apply(home_team_ranks_higher, axis = 1)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-112-d3a62e1e7d32> in <module>
      6     return home_rank < visitor_rank
      7 
----> 8 dataset["HomeTeamRanksHigher"] = dataset.apply(home_team_ranks_higher, axis = 1)
      9 
     10 #dataset["HomeTeamRanksHigher"] = 0

~\anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   7546             kwds=kwds,
   7547         )
-> 7548         return op.get_result()
   7549 
   7550     def applymap(self, func) -> "DataFrame":

~\anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self)
    178             return self.apply_raw()
    179 
--> 180         return self.apply_standard()
    181 
    182     def apply_empty_result(self):

~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    269 
    270     def apply_standard(self):
--> 271         results, res_index = self.apply_series_generator()
    272 
    273         # wrap results

~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
    298                 for i, v in enumerate(series_gen):
    299                     # ignore SettingWithCopy here in case the user mutates
--> 300                     results[i] = self.f(v)
    301                     if isinstance(results[i], ABCSeries):
    302                         # If we have a view on v, we need to make a copy because

<ipython-input-112-d3a62e1e7d32> in home_team_ranks_higher(row)
      2     home_team = row["Home"]
      3     visitor_team = row["Away"]
----> 4     home_rank = standings.loc[home_team]["Rk"]
      5     visitor_rank = standings.loc[visitor_team]["Rk"]
      6     return home_rank < visitor_rank

~\anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
    877 
    878             maybe_callable = com.apply_if_callable(key, self.obj)
--> 879             return self._getitem_axis(maybe_callable, axis=axis)
    880 
    881     def _is_scalar_access(self, key: Tuple):

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1108         # fall thru to straight lookup
   1109         self._validate_key(key, axis)
-> 1110         return self._get_label(key, axis=axis)
   1111 
   1112     def _get_slice_axis(self, slice_obj: slice, axis: int):

~\anaconda3\lib\site-packages\pandas\core\indexing.py in _get_label(self, label, axis)
   1057     def _get_label(self, label, axis: int):
   1058         # GH#5667 this will fail if the label is not present in the axis.
-> 1059         return self.obj.xs(label, axis=axis)
   1060 
   1061     def _handle_lowerdim_multi_index_axis0(self, tup: Tuple):

~\anaconda3\lib\site-packages\pandas\core\generic.py in xs(self, key, axis, level, drop_level)
   3489             loc, new_index = self.index.get_loc_level(key, drop_level=drop_level)
   3490         else:
-> 3491             loc = self.index.get_loc(key)
   3492 
   3493             if isinstance(loc, np.ndarray):

~\anaconda3\lib\site-packages\pandas\core\indexes\range.py in get_loc(self, key, method, tolerance)
    356                 except ValueError as err:
    357                     raise KeyError(key) from err
--> 358             raise KeyError(key)
    359         return super().get_loc(key, method=method, tolerance=tolerance)
    360 

KeyError: 'CFR Cluj'

Note: I tried to interchange the 'Rk' and 'Squad' columns, but I could not get any result, just different errors.

What I am looking for is getting the rank of every home team / visitor team from my history of matches that are found in the final table (standings) and store them in 'home_rank' / 'visitor_rank' variables.

PS: I tried other ideas to access the rank but none of them got me any result.

Any ideas or solutions are great! Thank you :)

Guess you create a dataframe standings from a file standings.csv. I presume, that the dataframe standings can be accessed with the column names Squad, Rk, etc.: like standings['Squad']. The error however shows you try to asses a column standings['CFR Cluj'] (KeyError). — Marc
– Marc, Commented Apr 18, 2021 at 13:53
That's right, I am creating a dataframe from the csv table. How can I manage to access the rows that contains the "team" when I encounter it in the history csv?. If it's 'CFR Cluj' I would want to get it's rank and compare the rank of it the rank of the other team. — Rowend
– Rowend, Commented Apr 18, 2021 at 13:59
Also I tried this, but it just does not work in the end home_rank = standings[standings["Squad"] == home_team]["Rk"] visitor_rank = standings[standings["Squad"] == visitor_team]["Rk"]. I tried this when my first column was 'Rk'. — Rowend
– Rowend, Commented Apr 18, 2021 at 14:02
Ok. I've added the above thing and i get this TypeError: cannot convert the series to <class 'int'> . Also casted the 'home_rank' and 'visitor_rank' to int to do the comparation from the return. — Rowend
– Rowend, Commented Apr 18, 2021 at 14:09
It's kinda tough to help you without some sample data. In principle you have to find the index of the dataframe standings and use this to access the column standings['Rk'][index]. Where the index would be something like index= standings.index[standings['squad'] == 'CFR Cluj'].to_list()[0] - note the last [0] which uses the first matching squad name index. — Marc
– Marc, Commented Apr 18, 2021 at 14:15

Marc · Accepted Answer · 2021-04-18 14:35:20Z

1

The KeyError reflects, that you try to index your dataframe standings with a row value instead of a column name. You might try to access the squads rank home_rank (and similarly for visitor_rank) with

home_rank  = standings['Rk'][ standings['Squad']=='CFR Cluj' ][0]
#home_rank = standings['Rk'].loc[ standings['Squad']=='CFR Cluj' ][0]

Step by step this is equal to

boolean_indices = standings['Squad']=='CFR Cluj'
standings_ranks = standings['Rk']
home_ranks      = standings_ranks[boolean_indices] 
home_rank       = home_ranks[0]  #if unique it only contains a single value

edited Apr 18, 2021 at 14:35

answered Apr 18, 2021 at 14:29

Marc

7344 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Rowend Over a year ago

Just tried the idea from the comments above, it kinda works. I saw where my mistake was, If I print home_rank = standings['Rk'][ standings['Squad']=='CFR Cluj' ][0] it gives me the first rank value, which is what I am looking for. The problem is what I do with that 0 ... I'll try to find the solution. Thank you very much!

Rowend Over a year ago

I'm coming back with an update, it works now :)

Collectives™ on Stack Overflow

Python pandas KeyError

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related