How to manage nested schema's in event driven architecture and where to store them?

Question

We are developing an IoT cloud solution with an event driven architecture.

The devices produce events and communicate with consumers through an event broker. All event messages are serialized in JSON. The devices also listen for cloud-to-device messages.

We want to define a standard schema for all event messages, e.g. the following format:

{
  "metaData": {
    "eventId": "a492e40d-6950-4e89-b27c-8b2e27cbf19d",
    "timestamp": "09/19/2024, 11:56:12",
    ...
  },
  "data": {
    ...
  }
}

The schema of this format should be available in multiple repositories for (de)serialization and validation of messages. We figured this could be stored as a JSON schema in a central registry.

However, we also use a combination of Pydantic models which should account for the "data" portion of the event message. The composition of this portion depends on the device and type of message and may evolve over time. Those models should be used by both producers and consumers. Right now they are declared in a package which is imported on both producers and consumers.

The question is: how should these nested schemas be stored and managed? Should all of them be present in a schema registry, or could this also be solved in code? Should the schema version be present in the event message itself?

This is really an opinion-based question and will likely lead to further discussion. I recommend opening a discussion in the JSON Schema community repo. — gregsdennis
– gregsdennis, Commented Sep 19, 2024 at 18:42
Since this is new to me, I was expecting a simple fix based on best practices. But upon further research on the topic and given that this is highly context dependent you are right. Thanks for your suggestion. — poklaassen
– poklaassen, Commented Sep 26, 2024 at 7:42

Adrian K · Accepted Answer · 2024-09-23 04:07:50Z

0

how should these nested schemas be stored and managed? Should all of them be present in a schema registry, or could this also be solved in code?

If you put it in code then every time the registry changes you risk having to force a redeployment of code for what's essentially a config/data driven change.

I'd have thought it was better to store all of them in a central registry, where you can centrally manage them, and you have a single source of truth. I'm assuming that both event producers and consumers will be able to access the registry when needed.

Should the schema version be present in the event message itself?

Yes, have two fields, one for the data, and one for the schema that specifies what schema the data conforms to.

answered Sep 23, 2024 at 4:07

Adrian K

10.4k4 gold badges43 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to manage nested schema's in event driven architecture and where to store them?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related