5

My goal is to serialize information by using protobuf in C++.

proto file:

syntax = "proto3";

package PhoneBookSerialize;

message Date {
  int32 year = 1;
  int32 month = 2;
  int32 day = 3;
}

message Contact {
  string name = 1;
  Date birthday = 2;
  repeated string phone_number = 3;
}


message ContactList{
  repeated Contact contact = 1;
}

Contact related code:

struct Contact {
  std::string name;
  std::optional<Date> birthday;
  std::vector<std::string> phones;

  bool operator<(const Contact& other) const {
    return name < other.name;
  }
};

class PhoneBook {
public:
  
  explicit PhoneBook(std::vector<Contact> contacts);
  void SaveTo(std::ostream& output) const;

private:
  std::vector<Contact> contact_book;

};

PhoneBook::PhoneBook(std::vector<Contact> contacts) : contact_book(contacts) {
    std::sort(contact_book.begin(), contact_book.end());
};

Serialization function:

void PhoneBook::SaveTo(std::ostream& output) const {
    PhoneBookSerialize::ContactList contact_list;
    for(const auto& contact : contact_book){
        PhoneBookSerialize::Contact* pb_contact = contact_list.add_contact();
        pb_contact->set_name(contact.name);
        if(contact.birthday.has_value()){
            PhoneBookSerialize::Date* pb_date = pb_contact->mutable_birthday();
            pb_date->set_year(contact.birthday->year);
            pb_date->set_month(contact.birthday->month);
            pb_date->set_day(contact.birthday->day);
        }
        
        for(const auto& phone : contact.phones){
            pb_contact->add_phone_number(phone);
        }
    }

    contact_list.SerializeToOstream(&output);
};

main.cpp file

#include "phone_book.h"
#include "contact.pb.h"
#include <sstream>

using namespace std;

int main(){
    const PhoneBook ab({
        {"Ab ba", Date{1980, 1, 13}, {"+79850685521"}},
        {"Ac ca", Date{1989, 4, 23}, {"+79998887766", "+71112223344"}},
        {"Ad da", Date{1989, 4, 23}, {}},
        {"No Birthday", std::nullopt, {"+7-4862-77-25-64"}},
      });
    
      ostringstream output(std::ios::binary);
      ab.SaveTo(output);
    
      istringstream input(output.str(), std::ios::binary);
    
      PhoneBookSerialize::ContactList list;
      list.ParseFromIstream(&input);
      return 0;
}

CMakeLists.txt file:

cmake_minimum_required(VERSION 3.10)
project(PhoneBookProtobuf LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# Find Protocol Buffers package
find_package(Protobuf REQUIRED)

# Find Abseil package
find_package(absl REQUIRED)

include_directories(${Protobuf_INCLUDE_DIRS})
include_directories(${CMAKE_CURRENT_BINARY_DIR})

# Generate protobuf files from proto directory
protobuf_generate_cpp(PROTO_SRCS PROTO_HDRS proto/contact.proto)

# Add all source files
add_executable(main
    src/main.cpp
    src/phone_book.cpp
    src/phone_book.h
    ${PROTO_SRCS}
    ${PROTO_HDRS}
)

# Link necessary libraries
target_link_libraries(main 
    ${Protobuf_LIBRARIES}
    absl::log
    absl::log_internal_message
    absl::log_internal_check_op
)

I am Using the following commands to build and run my code:

1. Configure project:
cmake --preset default

2. Build Project:
cmake --build --preset debug

3. Run Project:
./build/main

After I run the project I am getting a segmentation fault at the list.ParseFromIstream(&input);. I suppose, it has something to do with my Cmake configuration file. I also added abseil package into my Cmake configuration, because without it on my OS (macOS) code doesn't compile.

Error messages:

AddressSanitizer:DEADLYSIGNAL
=================================================================
==4319==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000020 (pc 0x000000000020 bp 0x7ff7b0d5f350 sp 0x7ff7b0d5f318 T0)
==4319==Hint: pc points to the zero page.
==4319==The signal is caused by a READ memory access.
==4319==Hint: address points to the zero page.
    #0 0x000000000020  (<unknown module>)
    #1 0x00010f5e6a24 in google::protobuf::internal::TcParser::FastMtR1(google::protobuf::MessageLite*, char const*, google::protobuf::internal::ParseContext*, google::protobuf::internal::TcFieldData, google::protobuf::internal::TcParseTableBase const*, unsigned long long)+0x74 (libprotobuf.29.3.0.dylib:x86_64+0xb8a24)
    #2 0x00010f63abdb in bool google::protobuf::internal::MergeFromImpl<false>(google::protobuf::io::ZeroCopyInputStream*, google::protobuf::MessageLite*, google::protobuf::internal::TcParseTableBase const*, google::protobuf::MessageLite::ParseFlags)+0xd1 (libprotobuf.29.3.0.dylib:x86_64+0x10cbdb)
    #3 0x00010f63bda2 in google::protobuf::MessageLite::ParseFromIstream(std::__1::basic_istream<char, std::__1::char_traits<char>>*)+0x32 (libprotobuf.29.3.0.dylib:x86_64+0x10dda2)
    #4 0x00010f1adbf9 in main main.cpp:23
    #5 0x7ff8165ad52f in start+0xbef (dyld:x86_64+0xfffffffffff1f52f)

==4319==Register values:
rax = 0x0000606000001700  rbx = 0x00007ff7b0d5fb20  rcx = 0x0000000000000020  rdx = 0x0000000000000000  
rdi = 0x000000010f220840  rsi = 0x0000606000001700  rbp = 0x00007ff7b0d5f350  rsp = 0x00007ff7b0d5f318  
 r8 = 0xffffffffffffffff   r9 = 0x0000000000000000  r10 = 0x00007fffffffff01  r11 = 0x00007fffffffff01  
r12 = 0x000000010f222480  r13 = 0x00007ff7b0d5f3e8  r14 = 0x0000000000000000  r15 = 0x0000000000000000  
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (<unknown module>) 
==4319==ABORTING
zsh: abort      ./build/main
8
  • Please share the backtrace from the crash. Commented Apr 16 at 7:29
  • 2
    Your main.cpp isn't valid code. Should we assume everything under include stuff should be inside int main() {}? Commented Apr 16 at 7:33
  • 1
    Use the debugger to get the backtrace. Also, try first saving the result of output.str() into a std::string and then passing this string to istringstream constructor. Commented Apr 16 at 7:34
  • 1
    @kiner_shah added backtrace Commented Apr 16 at 7:44
  • 2
    Serializing a single Contact and then trying to deserialize it as a ContactList looks a bit wrong. Commented Apr 16 at 8:18

1 Answer 1

4

The on-wire representation of PhoneBookSerialize::ContactList isn't equivalent of sequence of on-wire representation of individual structs PhoneBookSerialize::Contact. Protobuf serializes size of "array" represented by repeated field as well as field id an structure information headers and separators.

Which is good, because defaulted structures have length equal to zero on-wire. So PhoneBookSerialize::Contact empty{}; would write no data at all to stream. But ContactList would aknowledge that there is an empty element and retain information about its existence.

To deserialize from stream you usually need data bound separators as you never can predict size of expected data structure, Protobuf agressively reduces amount of written data. With packet-like I/O (which this concept is originally used in 1990s, e.g. in telemetry or hardware protocols, see ASN and co.) that's not required. repeated fields reduce the need in such separators when transferring multiple similar records.

EDIT after code was fixed:

Is it supposed to be safe to use list.ParseFromIstream? Yes, provided protobuf or string storage wasn't corrupted beforehand. Plus you treat field of Date as optional whie you didn't declare it as such and didn't gave it a default value in protobuf definition. In your case the source stram access wasn't safe, because serialization had priorly failed. With given definition of protocol your serialziation should've set Date's values in any given case.

    if (contact.birthday.has_value()) {
        PhoneBookSerialize::Date* pb_date = pb_contact->mutable_birthday();
        pb_date->set_year(contact.birthday->year);
        pb_date->set_month(contact.birthday->month);
        pb_date->set_day(contact.birthday->day);
    }
    else {
        //all non-optionals still must be set or serialization fails,
        //no data may be written (implementation-dependant)

        pb_contact->mutable_birthday()->set_year(0);
        pb_contact->mutable_birthday()->set_month(0);
        pb_contact->mutable_birthday()->set_day(0);
    }

You always should check what is returned by, at least by

    bool res = contact_list.SerializeToOstream(&output);

Protobuf library got debug logging, but it's off or you had no active console to see it.

This can be avoided by:

message Contact {
  string name = 1;
  optional Date birthday = 2;
  repeated string phone_number = 3;
}

By default, nested structures do not exist in allocated buffer, unless you had set them.

Curiosly, protobuf v.2 (not proto2 protocol, the library) does that in safe way, no crash will happen. Protobuf v.3 fails in a more dangerous manner, and your output string got length 0, which you didn't check. Both serializaton and de-serialization functions have return values.

PS. How you defined Contact in.. phone_book.h I guess?.. that header must had have using namespace PhoneBookSerialize; But it makes code illegal, IFNDR (so may fail silently under some conditions), normally. Because both global namescape and PhoneBookSerialize namespace both contain Contact, which is referenced later in code. Cl v.15 with strict compliance flags had complained about it. Recent v19... didn't to me, I guess, that's bad. I had to write

#include <string>
#include <optional>
#include <vector>
#include <iostream>
#include <algorithm>
#include "contact.pb.h"

struct Date{
  int year;
  int month;
  int day;
};

struct Contact {
    std::string name;
    std::optional<Date> birthday;
    std::vector<std::string> phones;

    bool operator<(const Contact& other) const { 
        return name < other.name;
    }
};
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you for your answer: I experimented a bit with the code and found out, that problem is caused by Date in Contact message. Doesn't matter if it's optional, not optional, it causes segmentation fault an I still don't understand why..
So basically I have a segmentation fault, if inside message there is another message from proto file. In the provided case segmentation fault is caused by Date mesage for some reason.
@DaniilYefimov I finally got to try to replicate it. There are issues that code isn't quite complete tand there miiight be problem sources which were taken out of code (e.g. omitting includes, using namespace, etc, It's a constants source of grief at my work when someone does stuff like that, it worsk on Windows where they develop, but causes UB on linux - implementations of library aren't same). E.g. what if you had some other definition of Date, in other proto module, etc?
Regarding last question: nope, I don't think so. I actually narrowed down the problem (see another post: stackoverflow.com/questions/79577298/…). The problem is a message field (like Date) in another message (like Contact). It causes segmentation fault. I have no idea why.
@DaniilYefimov well, the way I wrote it , it doesn't, though it did unless I set all fields. The way it was written in question, it can't compile (it's not possible to use PhoneBookSerialize::Date with std::sort, etc)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.