6

I produce some data in C++ that I want to access in a Python program. I have figured out how to serialize/deserialize to/from a binary file with boost in C++, but not how to access the data in Python (without manually parsing the binary file).

Here is my C++ code for serialization:

/* Save some data to binary file */
template <typename T>
int serializeToBinaryFile( const char* filename, const T& someValue,
                           const vector<T>& someVector )
{
    ofstream file( filename, ios::out | ios::binary | ios::trunc );

    if ( file.is_open() )
    {
        boost::archive::text_oarchive oa(file);

        int sizeOfDataType = sizeof(T);

        oa & sizeOfDataType;
        oa & someValue;
        oa & someVector;

        file.close();

        return 0;
    } else {
        return 1;
    }
}

Here is my C++ code for deserialization:

/* Load some data from binary file */
template <typename T>
int deSerializeFromBinaryFile( const char* filename, int& sizeOfDataType,
                               T& someValue, vector<T>& someVector )
{
    ifstream file( filename, ios::in | ios::binary );

    if ( file.is_open() )
    {
        boost::archive::text_iarchive ia(file);

        ia & sizeOfDataType;
        ia & someValue;
        ia & someVector;

        file.close();

        return 0;
    } else {
        return 1;
    }
}

How can I load the value and vector to objects in a Python program?

2
  • 1
    You can wrap your deserialization code with boost::python and call it. Commented May 29, 2015 at 8:01
  • 1
    IMHO, a better way is to use Google protobuf Commented May 30, 2015 at 7:53

3 Answers 3

5

The boost documentation makes reference to the fact that the binary representation is not portable, or even guaranteed to be consistent across different versions of your program.

You may want to use the xml serializer that's available in boost::serialization and then use an xml parser in python to read.

Instructions (and an example) on how to do this are here: http://www.boost.org/doc/libs/1_58_0/libs/serialization/doc/tutorial.html#archives

Note the use of the NVP macro to name the items in the archive.

Sign up to request clarification or add additional context in comments.

4 Comments

Is it really true that XML is faster as I go to larger and larger data sizes?
I'm not sure I understand the question. If you're asking whether the relative overhead of decoding the tags decreases as each element increases in size, the I suppose the answer is 'yes, marginally', but in reality I don't think speed is the primary concern when serialising to a text file.
Well, I want to use these methods for scientific computing purposes. In some instances, I will store smaller files with parameters, but typically, I will store a bunch of vectors/arrays with 10^6 or more elements of doubles. My intuition tells me that XML is excellent for the former application, but unbearingly slow for the latter. I guess I should rather look into HDF5 or NetCDF for the latter case?
@fromGiants hdf5 is designed for storing larger amounts of numerical data, so yes you should go for that, it is also available in python h5py
2

Use boost python to expose your deserialization function to python. You will need to expose the function for each type you need separately (can't expose templates to python).

1 Comment

Using portable format (such as XML), as suggested by Richard Hodges, is usually a better option.
1

That's not the way it works. In principal there are two reasons for serializing/deserializing:

  1. store and retrieve within the same SW-package. That is what boost archive was made for. there is no problem with types and the archive format.

  2. serialize for communication with other entities. That is a complete different story as you have to deal with machine word size, OS, programming language, localisation and more. Here you usually start by describing the serialized format starting with primitive types like Int32, String, Float and also composite types like Sequence, List and so on. Then you think how to represent these types in the different programming languages and how to serialize/deserialize. You decide to use e.g. struct/namedtuple for Sequence and vector<>/listfor List. While boost was not specifically designed for this there is a chance to use xml archives if you have that in mind when describing the serialized format.

There is a special case if you want to communicate on the same machine. Here you could wrap the serialisation (I strongly recommend to use the same dll on C++ and python side). Anyhow you must stick to pythons ctypes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.