-2
\$\begingroup\$

I want to serialize a C++ class Ramdomclass . Below is the serialization function.

std::vector<uint8_t> Serialize(Ramdomclass &Ramdom)
{
    // First encode the dom content
    std::vector<uint8_t> Buf;

    // First Part
    // Name: vec(byte)
    // vec(byte): size:u32, bytes

    auto EncodedSize = EncodeU32(Ramdom.getName().length());
    Buf.insert(Buf.end(), EncodedSize.begin(), EncodedSize.end());
    Buf.insert(Buf.end(), Ramdom.getName().cbegin(), Ramdom.getName().cend());

    // Second Part

    // Buf: vec(byte)
    // vec(byte): size:u32, bytes
    EncodedSize = EncodeU32(Ramdom.getContent().size());
    Buf.insert(Buf.end(), EncodedSize.begin(), EncodedSize.end());
    Buf.insert(Buf.end(), Ramdom.getContent().cbegin(), Ramdom.getContent().cend());

    //  dom ID
    std::vector<uint8_t> Result;
    Result.push_back(0x00U);

    // The dom size.
    EncodedSize = EncodeU32(Buf.size());
    Result.insert(Result.end(), EncodedSize.begin(), EncodedSize.end());

    // add dom content.
    Result.insert(Result.end(), Buf.begin(), Buf.end());

    // This is an bruteforce example. Can you do more fast ?.
    return Result;
}


std::vector<uint8_t> EncodeU32(uint32_t d)
{

    std::vector<uint8_t> encoded;

    // unsigned LEB128 encoding
    do
    {
        auto x = d & 0b01111111;
        d >>= 7;
        if (d)
            x |= 0b10000000;
        encoded.push_back(x);
    } while (d);

    return encoded; // this will be in bytes if we want to see it then unsigned(encoded[i]) will be a number betweern 0 to 255
}

I think I can improve in terms of the way I am appending different parts in std::vector<uint8_t> Buf;. I want to know is there any better way of doing this ? maybe instead of using insert can I used anyone other way

Note:: All the things I am appending to std::vector<uint8_t> Buf are in bytes(binary).

\$\endgroup\$
4
  • 2
    \$\begingroup\$ There's a lot of omission that makes this code hard to review. Without the definition of Randomclass and without the deserialisation code, it's difficult to be sure it's even correct. For a good review, I recommend you include both of those, and the unit-tests, too. \$\endgroup\$ Commented Oct 15, 2022 at 15:51
  • \$\begingroup\$ Personally, I don't like binary serialization. As a lead, I would push back and ask why not use a text based serialization method as a first version. It's easier to do, easier to debug, and easier to fix when it goes wrong. There are reason to use binary serialization (speed/size), but you have to justify the use case because of all the extra headaches (and thus work) it introduces. So my first question is: Why not serialize into JSON or YAML? \$\endgroup\$ Commented Oct 16, 2022 at 18:11
  • \$\begingroup\$ Also why are you serializing to in memory buffer. I would abstract this to a stream. The stream could be an in memory buffer but could also be a file or a socket (your code should not need to know where the data is going). By using a stream you will remove the need to serialize to a buffer than copy to the final destination. \$\endgroup\$ Commented Oct 16, 2022 at 18:17
  • \$\begingroup\$ @MartinYork, thanks. Ramdomclass is only one class there are many such class that need to be serialize.. In my case serialize and deserialize works as an API. Can you show how to do with stream ? considering Serialize(Classes &AllClasses){ Serialize(Ramdomclass &Ramdom) ; Serialize(Ramclass &Ram) ; Serialize(Rodonclass &Rodan); .... } and every class contains same members . \$\endgroup\$ Commented Oct 17, 2022 at 6:14

1 Answer 1

1
\$\begingroup\$

Improving on insert()

Your spider senses are tingling for good reasons. std::vector's insert() function gets two iterators that describe the range to copy from, but if it's written in a generic way it cannot make any assumptions about those iterators, so it might just copy character by character instead of doing a call to memcpy(). Also, every time it adds another character, it has to make sure Buf is large enough, and reallocate its storage if not.

However, it can in principle detect what you are trying to do and implement an optimized version that basically just reallocates and then memcpy()s. Some compilers do this, while others don't. You could "help" the compiler by making this explicit; instead of calling insert(), write:

auto old_size = Buf.size();
Buf.resize(old_size + Random.getName().size());
std::copy_n(Ramdom.getName().cbegin(), Ramdom.getName().size(),
            Buf.begin() + old_size);

However, that might also generate worse output, as resize() might cause value-initialization of the extra storage being added, even though it will be overwritten by the call to std::copy_n(). It all depends on how the compiler optimizes your code.

While the above shows how you could change a single call to insert(), consider that you could also calculate the size of everything you are going to add up-front, then do a single resize() in Serialize(), and then just copy all the data to the right place inside Buf.

If you want to write reasonably performant, maintainable code, then I would keep the calls to insert() as you have now. If you need to get every bit of performance you can and maintainability is of lesser concern, then try the above, create realistic performance benchmarks, and measure this on several platforms with different compilers to see if it's worth it or not.

Call reserve() if possible

Regardless of how you add elements to vectors, if you have some idea of how much you are going to add, you can call reserve() first. This might avoid multiple memory (re)allocations later on.

Avoid code duplication

Your code basically does the same thing three times: add a size to a buffer and then copy the data. Consider writing a function that does that:

template<typename T>
static void append_data(std::vector<uint8_t> &Buf, const T &Data)
{
    auto EncodedSize = EncodeU32(Data.size());
    Buf.insert(Buf.end(), EncodedSize.begin(), EncodedSize.end());
    Buf.insert(Buf.end(), Data.cbegin(), Data.cend());
}

std::vector<uint8_t> Serialize(Ramdomclass &Ramdom)
{
    std::vector<uint8_t> Buf;
    append_data(Buf, Ramdom.getName());
    append_data(Buf, Ramdom.getContent());

    std::vector<uint8_t> Result;
    Result.push_back(0x0);
    append_data(Result, Buf);

    return Result;
}

Alternative ways to serialize data

There are other approaches to serialize data that you could use, or perhaps look at to get inspiration for your own serialization code.

If you want to have a clean and extensible way to serialize your own data structures in C++, then Boost::Serialization might be interesting for you. The drawback is that it doesn't do any compression of integers as far as I am aware.

Instead of a custom serialization format, you could use a standard format. For example, MessagePack seems like a good fit here, but there are many more.

\$\endgroup\$
6
  • \$\begingroup\$ Thanks, in what case will copy_n gives undefined behavior considering my case? \$\endgroup\$ Commented Oct 18, 2022 at 10:17
  • \$\begingroup\$ I'm not sure what you mean by that? The example I showed is correct and should not have any undefined behavior. \$\endgroup\$ Commented Oct 18, 2022 at 11:15
  • \$\begingroup\$ thanks again.. Is your way ie by using copy_n() is the fastest way to add elements into a vector compare to using a insert()? \$\endgroup\$ Commented Oct 18, 2022 at 12:05
  • \$\begingroup\$ You have to combine it with resize(). And as I wrote in my answer, it depends on the implementation the standard library and your compiler which way will be faster. \$\endgroup\$ Commented Oct 18, 2022 at 12:53
  • \$\begingroup\$ I tried your solution but I am getting the error.. I am not getting why is happening check here onlinegdb.com/8MZTHabr1 . Can you please help me \$\endgroup\$ Commented Oct 18, 2022 at 13:55

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.