Issues with adding a MultiIndex Pandas DataFrame to a PyTables HDFStore

Question

I use a for loop over a list filled with rasters. Of each raster I extract an array of data and I want to use the basename (date) of the raster as index for this array. For this I use a Pandas DataFrame Multi-Index. The array including the new set index is then appended to a HDFStore. Next a raster with another date is selected

Code snippets:

root, ext = os.path.splitext(raster)
name = int(decimal.Decimal(os.path.basename(root)))

array = ma.MaskedArray.compressed(raster)
arr2df = pd.DataFrame(pd.Series(data = array), columns=['rastervalue'])
arr2df['timestamp'] = pd.Series(name,index=arr2df.index)
arr2df.set_index('timestamp')
store.append('rastervalue',arr2df)

DataFrame seems to be ok (btw how can I retrieve a MultiIndex?).

>>> arr2df
<class 'pandas.core.frame.DataFrame'>
  MultiIndex: 123901 entries, (0, 20060101) to (123900, 20060101)
  Data columns (total 1 columns):
  rastervalue    123901  non-null values
  dtypes:        int32(1)

But at the moment when I check the HDFStore it seems that my Multi-Index is disappeared and changed into "values_block_1"

>>> store.root.rastervalue.table.read
<bound method Table.read of /rastervalue/table (Table(12626172,)) ''
  description := {
  "index": Int64Col(shape=(), dflt=0, pos=0),
  "values_block_0": Int32Col(shape=(1,), dflt=0, pos=1),
  "values_block_1": Int64Col(shape=(1,), dflt=0, pos=2)}
  byteorder := 'little'
  chunkshape := (3276,)
  autoIndex := True
  colindexes := {
    "index": Index(6, medium, shuffle, zlib(1)).is_CSI=False}>

>>> store.root.rastervalue.table.read(field="values_block_1")
array([[20060101],
       [20060101],
       [20060101],
       ...,
       [ 20060914],
       [ 20060914],
       [ 20060914]], dtype=int64)

By reading the documentation I can't figure out how to store or change a MultiIndex in a HDFStore correctly. Any suggestions? Eventually I would like to query the table as:

 store.select('rastervalue', [ pd.Term('index', '=', '20060101')])

your use of the MaskArray might be doing funny things with the index, can you provide an example that is reproducible and/or show some of the frame you are trying to store (df.head(10)) or something...? — Jeff
– Jeff, Commented Jun 25, 2013 at 11:35
I just noticed that your set_index is not assigned to anything; this is NOT an inplace operation (unless you pass inplace=True) — Jeff
– Jeff, Commented Jun 25, 2013 at 13:33

Jeff · Accepted Answer · 2013-06-25 13:33:10Z

1

Here is a working example.

In [43]: df = DataFrame(dict(ivalue = range(123901), date = 20060101, 
              value = Series([1]*123901,dtype='int32'))).set_index(['ivalue','date'])

In [44]: df
Out[44]: 
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 123901 entries, (0, 20060101) to (123900, 20060101)
Data columns (total 1 columns):
value    123901  non-null values
dtypes: int32(1)

In [45]: df.head()
Out[45]: 
                 value
ivalue date           
0      20060101      1
1      20060101      1
2      20060101      1
3      20060101      1
4      20060101      1

In [46]: store = pd.HDFStore('test.h5',mode='w')

In [48]: store.append('df',df)

In [49]: store
Out[49]: 
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
/df            frame_table  (typ->appendable_multi,nrows->123901,ncols->3,indexers->[index],dc->[date,ivalue])

In [50]: store.get_storer('df')
Out[50]: frame_table  (typ->appendable_multi,nrows->123901,ncols->3,indexers->[index],dc->[date,ivalue])
In [51]: store.get_storer('df').attrs
Out[51]: 
/df._v_attrs (AttributeSet), 14 attributes:
   [CLASS := 'GROUP',
    TITLE := '',
    VERSION := '1.0',
    data_columns := ['date', 'ivalue'],
    encoding := None,
    index_cols := [(0, 'index')],
    info := {'index': {}},
    levels := ['ivalue', 'date'],
    nan_rep := 'nan',
    non_index_axes := [(1, ['ivalue', 'date', 'value'])],
    pandas_type := u'frame_table',
    pandas_version := '0.10.1',
    table_type := u'appendable_multiframe',
    values_cols := ['values_block_0', 'date', 'ivalue']]

In [52]: store.get_storer('df').table
Out[52]: 
/df/table (Table(123901,)) ''
  description := {
  "index": Int64Col(shape=(), dflt=0, pos=0),
  "values_block_0": Int32Col(shape=(1,), dflt=0, pos=1),
  "date": Int64Col(shape=(), dflt=0, pos=2),
  "ivalue": Int64Col(shape=(), dflt=0, pos=3)}
  byteorder := 'little'
  chunkshape := (2340,)
  autoIndex := True
  colindexes := {
    "date": Index(6, medium, shuffle, zlib(1)).is_CSI=False,
    "index": Index(6, medium, shuffle, zlib(1)).is_CSI=False,
    "ivalue": Index(6, medium, shuffle, zlib(1)).is_CSI=False}

answered Jun 25, 2013 at 13:33

Jeff

130k21 gold badges223 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Mattijn Over a year ago

I see, my mistake was already made with setting the multi-index on the DataFrame. Make sense then that the HDFStore doensn't like the input. Thanks for explaining the set_index related to the inplace operation. I didn't realize needing it. Btw, now I also don't need it, since your working example works like a charm. I hope one day to be able to answers question on SO like you do:). Keep it up.

Jeff Over a year ago

np glad it worked out. you might find the cookbook useful: pandas.pydata.org/pandas-docs/dev/cookbook.html#hdfstore

K.-Michael Aye Over a year ago

what is the meaning of is_CSI=False ?

K.-Michael Aye Over a year ago

ok, Completely Sorted Index. So, does this mean, that the indexing is slower than it can be, because this is set to False?

Jeff Over a year ago

read this section: pytables.github.io/usersguide/optimization.html; it is rarely necessary to create a CSI and pretty time consuming to do so (and worse it can only be done for 1 index). but if u want to experiment (and have lots of time) it might pay.

|

Collectives™ on Stack Overflow

Issues with adding a MultiIndex Pandas DataFrame to a PyTables HDFStore

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related