I use a for loop over a list filled with rasters. Of each raster I extract an array of data and I want to use the basename (date) of the raster as index for this array. For this I use a Pandas DataFrame Multi-Index. The array including the new set index is then appended to a HDFStore. Next a raster with another date is selected
Code snippets:
root, ext = os.path.splitext(raster)
name = int(decimal.Decimal(os.path.basename(root)))
array = ma.MaskedArray.compressed(raster)
arr2df = pd.DataFrame(pd.Series(data = array), columns=['rastervalue'])
arr2df['timestamp'] = pd.Series(name,index=arr2df.index)
arr2df.set_index('timestamp')
store.append('rastervalue',arr2df)
DataFrame seems to be ok (btw how can I retrieve a MultiIndex?).
>>> arr2df
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 123901 entries, (0, 20060101) to (123900, 20060101)
Data columns (total 1 columns):
rastervalue 123901 non-null values
dtypes: int32(1)
But at the moment when I check the HDFStore it seems that my Multi-Index is disappeared and changed into "values_block_1"
>>> store.root.rastervalue.table.read
<bound method Table.read of /rastervalue/table (Table(12626172,)) ''
description := {
"index": Int64Col(shape=(), dflt=0, pos=0),
"values_block_0": Int32Col(shape=(1,), dflt=0, pos=1),
"values_block_1": Int64Col(shape=(1,), dflt=0, pos=2)}
byteorder := 'little'
chunkshape := (3276,)
autoIndex := True
colindexes := {
"index": Index(6, medium, shuffle, zlib(1)).is_CSI=False}>
>>> store.root.rastervalue.table.read(field="values_block_1")
array([[20060101],
[20060101],
[20060101],
...,
[ 20060914],
[ 20060914],
[ 20060914]], dtype=int64)
By reading the documentation I can't figure out how to store or change a MultiIndex in a HDFStore correctly. Any suggestions? Eventually I would like to query the table as:
store.select('rastervalue', [ pd.Term('index', '=', '20060101')])
set_indexis not assigned to anything; this is NOT an inplace operation (unless you passinplace=True)