1

We have an application that serves as a configuration repository, storing data in a relational database. Whenever a user changes a configuration item, it is persisted to the database and our goal is to communicate these changes via Kafka messages to subscribers.

In the web UI, users can modify multiple sections of the configuration, often affecting multiple tables. After initial research, I understand that the typical way to model relational DB-like applications is by using one Kafka topic per table. However, this approach triggers multiple Kafka messages to different topics when a user modifies multiple tables.

How can consumers know how long to wait until all modifications are completed to react to changes? Or can you suggest an alternative way to model something similar?

1 Answer 1

1

After initial research, I understand that the typical way to model relational DB-like applications is by using one Kafka topic per table

It depends on the actual use-case. It is just fine to have a single topic which will have all the changes grouped into a single message if that is an appropriate approach. I.e. you will have message looking like the following:

{
   "user":"...",
   "changes":[
      {
         "entity":"one...",
          ... // changes
      },
      {
         "entity":"two...",
          ... // changes
      }
   ]
}

Note that if you can have A LOT of changes you might stumble upon the Kafka message size limit.

Another approach can be to have still a single topic but have a message per table and have some metadata that will have an unique "transaction"/operation id and number of changes in the "transaction"/operation:

{
   "user":"...",
   "metadata":{
      "transaction":"unique_tran_id",
      "operationNumber":"unique_operation_in_tran_id",
      "changesInTransaction":"total_num_of_changes"
   },
   "entity":"",
   ... // changes
}

Then you can have some saga-like approach when the processor will accumulate and count the processed operations and decide when processing is completed (using the count of the processed items + deduplication based on the operationNumber). This approach can also be spanned across multiple topics if needed. Though one note - IMO this approach is better be coupled with transactional outbox pattern so you minimize the chance that some "transaction"/operation will have only some messages published (i.e. processing of the "transaction"/operation will never complete).

Also do not forget - you are not limited to a single output Kafka. There are approaches when you have "business" and "technical" output queues, for example you can have a queue per table as "technical" ones and the "business" one using one of the first two approaches.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a LOT! I familiarized myself with the Outbox Pattern. One event in our case could be quite large, like 10k lines, mainly because all the relations from the DB are glued together. I see only two options here: Send 10k lines: The consumer deserializes it and stores it in its relational database. Split relations into smaller messages: The consumer implements a "saga-like" approach to collect all relations and then store them in the database. Do you see any one of those approaches as superior?
@GiorgiGviani 10k lines sounds like a relatively big message, but it depends. There are multiple way to handle message size - you can look into using compression or use binary protocol like protobuf. If the message is "really big" you can use something like S3-like storage and upload data there and use Kafka to send the "link" to it. Personally I would look into a single message approach since the "saga-like" one would be more cumbersome and brittle, but without seeing actual data, system and non-functional requirements it is a bit hard to tell.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.