Handling Multi-Table Updates in Kafka

Question

We have an application that serves as a configuration repository, storing data in a relational database. Whenever a user changes a configuration item, it is persisted to the database and our goal is to communicate these changes via Kafka messages to subscribers.

In the web UI, users can modify multiple sections of the configuration, often affecting multiple tables. After initial research, I understand that the typical way to model relational DB-like applications is by using one Kafka topic per table. However, this approach triggers multiple Kafka messages to different topics when a user modifies multiple tables.

How can consumers know how long to wait until all modifications are completed to react to changes? Or can you suggest an alternative way to model something similar?

Guru Stron · Accepted Answer · 2025-04-02 08:03:46Z

1

After initial research, I understand that the typical way to model relational DB-like applications is by using one Kafka topic per table

It depends on the actual use-case. It is just fine to have a single topic which will have all the changes grouped into a single message if that is an appropriate approach. I.e. you will have message looking like the following:

{
   "user":"...",
   "changes":[
      {
         "entity":"one...",
          ... // changes
      },
      {
         "entity":"two...",
          ... // changes
      }
   ]
}

Note that if you can have A LOT of changes you might stumble upon the Kafka message size limit.

Another approach can be to have still a single topic but have a message per table and have some metadata that will have an unique "transaction"/operation id and number of changes in the "transaction"/operation:

{
   "user":"...",
   "metadata":{
      "transaction":"unique_tran_id",
      "operationNumber":"unique_operation_in_tran_id",
      "changesInTransaction":"total_num_of_changes"
   },
   "entity":"",
   ... // changes
}

Then you can have some saga-like approach when the processor will accumulate and count the processed operations and decide when processing is completed (using the count of the processed items + deduplication based on the operationNumber). This approach can also be spanned across multiple topics if needed. Though one note - IMO this approach is better be coupled with transactional outbox pattern so you minimize the chance that some "transaction"/operation will have only some messages published (i.e. processing of the "transaction"/operation will never complete).

Also do not forget - you are not limited to a single output Kafka. There are approaches when you have "business" and "technical" output queues, for example you can have a queue per table as "technical" ones and the "business" one using one of the first two approaches.

edited Apr 2 at 8:03

answered Apr 1 at 19:58

Guru Stron

151k11 gold badges186 silver badges232 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Giorgi Gviani Apr 4 at 7:51

Thanks a LOT! I familiarized myself with the Outbox Pattern. One event in our case could be quite large, like 10k lines, mainly because all the relations from the DB are glued together. I see only two options here: Send 10k lines: The consumer deserializes it and stores it in its relational database. Split relations into smaller messages: The consumer implements a "saga-like" approach to collect all relations and then store them in the database. Do you see any one of those approaches as superior?

Guru Stron Apr 4 at 8:28

@GiorgiGviani 10k lines sounds like a relatively big message, but it depends. There are multiple way to handle message size - you can look into using compression or use binary protocol like protobuf. If the message is "really big" you can use something like S3-like storage and upload data there and use Kafka to send the "link" to it. Personally I would look into a single message approach since the "saga-like" one would be more cumbersome and brittle, but without seeing actual data, system and non-functional requirements it is a bit hard to tell.

Collectives™ on Stack Overflow

Handling Multi-Table Updates in Kafka

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related