1

I am reading data from image files and I want to append this data into a single HDF file. Here is my code:

datafile = pd.HDFStore(os.path.join(path,'imageData.h5'))
for file in fileList: 
     data = {'X Position' :  pd.Series(xpos, index=index1),
             'Y Position' :  pd.Series(ypos, index=index1),
             'Major Axis Length' :  pd.Series(major, index=index1),
             'Minor Axis Length' :  pd.Series(minor, index=index1), 
             'X Velocity' :  pd.Series(xVelocity, index=index1),
             'Y Velocity' :  pd.Series(yVelocity, index=index1) }
    df = pd.DataFrame(data)
    datafile['df'] = df
    datafile.close()

This is obviously incorrect as it overwrites each set of data with the new one each time the loop runs.

If instead of datafile['df'] = df, I use

datafile.append('df',df)    

OR

df.to_hdf(os.path.join(path,'imageData.h5'), 'df', append=True, format = 'table')

I get the error:

ValueError: Can only append to Tables

I have referred to the documentation and other SO questions, without avail.

So, I am hoping someone can explain why this isn't working and how I can successfully append all the data to one file. I am willing to use a different method (perhaps pyTables) if necessary.

Any help would be greatly appreciated.

8
  • The second way (df.to_hdf(..., format="table", append=True)) is actually the right one. Have you tried using that (without all of the HDFStore stuff) with a fresh file? Commented Feb 26, 2014 at 8:15
  • @filmor You mean remove the line where I create the empty HDF file? Tried that, same error. Maybe the problem is that the data is in a DataFrame and not a table? Commented Feb 26, 2014 at 8:18
  • No, the error message is referring to the internal HDF5 table format that is used. IIRC in older versions of pandas (btw, which one are you using?) HDFStore used the fixed format by default which doesn't allow appending. The table format is the one used by PyTables. Commented Feb 26, 2014 at 8:34
  • Version 0.11.0 - I tried it on a fresh file and it worked, but without the for loop. Commented Feb 26, 2014 at 9:14
  • 1
    If you use format="table" to_hdf should allow appending by using PyTables internally, no need to do that yourself. You might want to update pandas, though. What does "and it worked" mean? Would you update the question? Commented Feb 26, 2014 at 10:04

1 Answer 1

2

This will work in 0.11. Once you create a group (e.g the label where you are storing data, the 'df' here). If you store a fixed format it will overwrite (and if you try to append will give you the above error msg); if you write a table format you can append. Note that in 0.11, to_hdf does not correctly pass keywords thru to the underlying function so you can use it ONLY to write a fixed format.

datafile = pd.HDFStore(os.path.join(path,'imageData.h5'),mode='w')
for file in fileList: 
     data = {'X Position' :  pd.Series(xpos, index=index1),
             'Y Position' :  pd.Series(ypos, index=index1),
             'Major Axis Length' :  pd.Series(major, index=index1),
             'Minor Axis Length' :  pd.Series(minor, index=index1), 
             'X Velocity' :  pd.Series(xVelocity, index=index1),
             'Y Velocity' :  pd.Series(yVelocity, index=index1) }
    df = pd.DataFrame(data)
    datafile.append('df',df)
datafile.close
Sign up to request clarification or add additional context in comments.

3 Comments

Perfect, thank you. The only change I had to make was to add datafile.open() before appending.
For clarity, the only difference here is the addition of mode='w' in the first line. In that case, what is the default mode?
default mode is to append (to the file); w creates a new file

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.