We are developing an IoT cloud solution with an event driven architecture.
The devices produce events and communicate with consumers through an event broker. All event messages are serialized in JSON. The devices also listen for cloud-to-device messages.
We want to define a standard schema for all event messages, e.g. the following format:
{
"metaData": {
"eventId": "a492e40d-6950-4e89-b27c-8b2e27cbf19d",
"timestamp": "09/19/2024, 11:56:12",
...
},
"data": {
...
}
}
The schema of this format should be available in multiple repositories for (de)serialization and validation of messages. We figured this could be stored as a JSON schema in a central registry.
However, we also use a combination of Pydantic models which should account for the "data" portion of the event message. The composition of this portion depends on the device and type of message and may evolve over time. Those models should be used by both producers and consumers. Right now they are declared in a package which is imported on both producers and consumers.
The question is: how should these nested schemas be stored and managed? Should all of them be present in a schema registry, or could this also be solved in code? Should the schema version be present in the event message itself?