21

I get a KeyError when I try to plot a slice of a pandas DataFrame column with datetimes in it. Does anybody know what could cause this?

I managed to reproduce the error in a little self contained example (which you can also view here: http://nbviewer.ipython.org/3714142/):

import numpy as np
from pandas import DataFrame
import datetime
from pylab import *

test = DataFrame({'x' : [datetime.datetime(2012,9,10) + datetime.timedelta(n) for n in range(10)], 
                  'y' : range(10)})

Now if I plot:

plot(test['x'][0:5])

there is not problem, but when I plot:

plot(test['x'][5:10])

I get the KeyError below (and the error message is not very helpfull to me). This only happens with datetime columns, not with other columns (as far as I experienced). E.g. plot(test['y'][5:10]) is not a problem.

Ther error message:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-7-aa076e3fc4e0> in <module>()
----> 1 plot(test['x'][5:10])

C:\Python27\lib\site-packages\matplotlib\pyplot.pyc in plot(*args, **kwargs)
   2456         ax.hold(hold)
   2457     try:
-> 2458         ret = ax.plot(*args, **kwargs)
   2459         draw_if_interactive()
   2460     finally:

C:\Python27\lib\site-packages\matplotlib\axes.pyc in plot(self, *args, **kwargs)
   3846         lines = []
   3847 
-> 3848         for line in self._get_lines(*args, **kwargs):
   3849             self.add_line(line)
   3850             lines.append(line)

C:\Python27\lib\site-packages\matplotlib\axes.pyc in _grab_next_args(self, *args, **kwargs)
    321                 return
    322             if len(remaining) <= 3:
--> 323                 for seg in self._plot_args(remaining, kwargs):
    324                     yield seg
    325                 return

C:\Python27\lib\site-packages\matplotlib\axes.pyc in _plot_args(self, tup, kwargs)
    298             x = np.arange(y.shape[0], dtype=float)
    299 
--> 300         x, y = self._xy_from_xy(x, y)
    301 
    302         if self.command == 'plot':

C:\Python27\lib\site-packages\matplotlib\axes.pyc in _xy_from_xy(self, x, y)
    215         if self.axes.xaxis is not None and self.axes.yaxis is not None:
    216             bx = self.axes.xaxis.update_units(x)
--> 217             by = self.axes.yaxis.update_units(y)
    218 
    219             if self.command!='plot':

C:\Python27\lib\site-packages\matplotlib\axis.pyc in update_units(self, data)
   1277         neednew = self.converter!=converter
   1278         self.converter = converter
-> 1279         default = self.converter.default_units(data, self)
   1280         #print 'update units: default=%s, units=%s'%(default, self.units)
   1281         if default is not None and self.units is None:

C:\Python27\lib\site-packages\matplotlib\dates.pyc in default_units(x, axis)
   1153         'Return the tzinfo instance of *x* or of its first element, or None'
   1154         try:
-> 1155             x = x[0]
   1156         except (TypeError, IndexError):
   1157             pass

C:\Python27\lib\site-packages\pandas\core\series.pyc in __getitem__(self, key)
    374     def __getitem__(self, key):
    375         try:
--> 376             return self.index.get_value(self, key)
    377         except InvalidIndexError:
    378             pass

C:\Python27\lib\site-packages\pandas\core\index.pyc in get_value(self, series, key)
    529         """
    530         try:
--> 531             return self._engine.get_value(series, key)
    532         except KeyError, e1:
    533             if len(self) > 0 and self.inferred_type == 'integer':

C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.IndexEngine.get_value (pandas\src\engines.c:1479)()

C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.IndexEngine.get_value (pandas\src\engines.c:1374)()

C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.DictIndexEngine.get_loc (pandas\src\engines.c:2498)()

C:\Python27\lib\site-packages\pandas\_engines.pyd in pandas._engines.DictIndexEngine.get_loc (pandas\src\engines.c:2460)()

KeyError: 0
1
  • Pandas bug report #4493 and also #6127 seem to be about this issue. Commented Dec 15, 2015 at 11:27

3 Answers 3

18

HYRY explained why you get the KeyError. To plot with slices using matplotlib you can do:

In [157]: plot(test['x'][5:10].values)
Out[157]: [<matplotlib.lines.Line2D at 0xc38348c>]

In [158]: plot(test['x'][5:10].reset_index(drop=True))
Out[158]: [<matplotlib.lines.Line2D at 0xc37e3cc>]

x, y plotting in one go with 0.7.3

In [161]: test[5:10].set_index('x')['y'].plot()
Out[161]: <matplotlib.axes.AxesSubplot at 0xc48b1cc>
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! But still, two questions remain: 1) Why is it only with the date column a problem (plot(test['y'][5:10]) is not a problem)? Although test['y'][5:10][0] also does not work, as for 'x'. 2) I understand that with an integer index you cannot index based on location with the standard indexing tools (which was causing my confusion, but is explained here: pandas.pydata.org/pandas-docs/dev/…), but are there less standard tools that make this possible (something like an attribute .ix but based on the location eg .ix_loc)?
Stepping the trace in the case of date column, shows that matplotlib tries to do x[0] on the dates to retrieve tz info, which throws a KeyError. This is not done on y column. Pandas has location based indexing tools, but they are not used by matplotlib internals.
Ah OK, I see. What are the location based indexing tools (for integer labels)? I can't find them directly in the documentation.
I see it is a recent change (pandas.pydata.org/pandas-docs/dev/…). In the meantime, I have learnt to use the dates as the index so I don't have the problem anymore, but still, an annoying (but inevitable I suppose) consequence of the integer indexing (and dificult to see what you've done wrong as a newbie). Thanks again!
8

Instead of calling plot(test["x"][5:10]), you can call the plot method of Series object:

test["x"][5:10].plot()

The reason: test["x"][5:10] is a Series object with integer index from 5 to 10. plot() try to get index 0 of it, that will cause error.

2 Comments

OK, thanks. And what if I want to plot x, y data? Like plot(test['x'], test['y']) with slicing.
I see the latest pandas (0.8.1) has the 'x' and 'y' keywords to do this (0.7.3, which I am using, not yet as far as I can see). But if I want to use my old native matplotlib functions and not pandas methods, there is no other way than first calling np.asarray before plotting?
4

I encountered this error with pd.groupby in Pandas 0.14.0 and solved it with df = df[df['col']!= 0].reset_index()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.