boost::serialization in C++, deserialization in Python

Question

I produce some data in C++ that I want to access in a Python program. I have figured out how to serialize/deserialize to/from a binary file with boost in C++, but not how to access the data in Python (without manually parsing the binary file).

Here is my C++ code for serialization:

/* Save some data to binary file */
template <typename T>
int serializeToBinaryFile( const char* filename, const T& someValue,
                           const vector<T>& someVector )
{
    ofstream file( filename, ios::out | ios::binary | ios::trunc );

    if ( file.is_open() )
    {
        boost::archive::text_oarchive oa(file);

        int sizeOfDataType = sizeof(T);

        oa & sizeOfDataType;
        oa & someValue;
        oa & someVector;

        file.close();

        return 0;
    } else {
        return 1;
    }
}

Here is my C++ code for deserialization:

/* Load some data from binary file */
template <typename T>
int deSerializeFromBinaryFile( const char* filename, int& sizeOfDataType,
                               T& someValue, vector<T>& someVector )
{
    ifstream file( filename, ios::in | ios::binary );

    if ( file.is_open() )
    {
        boost::archive::text_iarchive ia(file);

        ia & sizeOfDataType;
        ia & someValue;
        ia & someVector;

        file.close();

        return 0;
    } else {
        return 1;
    }
}

How can I load the value and vector to objects in a Python program?

You can wrap your deserialization code with boost::python and call it. — m0nhawk
– m0nhawk, Commented May 29, 2015 at 8:01

Richard Hodges · Accepted Answer · 2015-05-29 08:06:26Z

5

The boost documentation makes reference to the fact that the binary representation is not portable, or even guaranteed to be consistent across different versions of your program.

You may want to use the xml serializer that's available in boost::serialization and then use an xml parser in python to read.

Instructions (and an example) on how to do this are here: http://www.boost.org/doc/libs/1_58_0/libs/serialization/doc/tutorial.html#archives

Note the use of the NVP macro to name the items in the archive.

answered May 29, 2015 at 8:06

Richard Hodges

70.3k8 gold badges103 silver badges157 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

fromGiants Over a year ago

Is it really true that XML is faster as I go to larger and larger data sizes?

Richard Hodges Over a year ago

I'm not sure I understand the question. If you're asking whether the relative overhead of decoding the tags decreases as each element increases in size, the I suppose the answer is 'yes, marginally', but in reality I don't think speed is the primary concern when serialising to a text file.

fromGiants Over a year ago

Well, I want to use these methods for scientific computing purposes. In some instances, I will store smaller files with parameters, but typically, I will store a bunch of vectors/arrays with 10^6 or more elements of doubles. My intuition tells me that XML is excellent for the former application, but unbearingly slow for the latter. I guess I should rather look into HDF5 or NetCDF for the latter case?

doqtor Over a year ago

@fromGiants hdf5 is designed for storing larger amounts of numerical data, so yes you should go for that, it is also available in python h5py

doqtor · Accepted Answer · 2015-05-29 08:03:03Z

2

Use boost python to expose your deserialization function to python. You will need to expose the function for each type you need separately (can't expose templates to python).

answered May 29, 2015 at 8:03

doqtor

8,4942 gold badges22 silver badges36 bronze badges

1 Comment

No-Bugs Hare Over a year ago

Using portable format (such as XML), as suggested by Richard Hodges, is usually a better option.

stefan · Accepted Answer · 2015-05-29 09:01:30Z

That's not the way it works. In principal there are two reasons for serializing/deserializing:

store and retrieve within the same SW-package. That is what boost archive was made for. there is no problem with types and the archive format.
serialize for communication with other entities. That is a complete different story as you have to deal with machine word size, OS, programming language, localisation and more. Here you usually start by describing the serialized format starting with primitive types like Int32, String, Float and also composite types like Sequence, List and so on. Then you think how to represent these types in the different programming languages and how to serialize/deserialize. You decide to use e.g. struct/namedtuple for Sequence and vector<>/listfor List. While boost was not specifically designed for this there is a chance to use xml archives if you have that in mind when describing the serialized format.

There is a special case if you want to communicate on the same machine. Here you could wrap the serialisation (I strongly recommend to use the same dll on C++ and python side). Anyhow you must stick to pythons ctypes.

Collectives™ on Stack Overflow

boost::serialization in C++, deserialization in Python

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related