1

I have written an app using Azure Java SDK which pushes data to Azure Event Hub. This app makes use of partitioning and I'm using the car (a single manufacturer) VINs as the partition key. As the different VINs don't seem to differ a whole lot, all the messages seem to be landing in very few partitions leaving others empty. I can see why that might be happening (due to the reason that hash computed for partition keys may be resulting in few values versus total number of partitions of the event hub) but is there a way that we can make use of all the partitions?

Completely different partition key seems to occupy more partitions but not all.

5
  • If you do not assign a partition key, events will flow to all partitions in a round-robin manner. Otherwise, you'd need to assign an explicit partition identifier as the destination for each event to ensure distribution. As you noted, partition keys are hashed service-side and you have no visibility nor control over the distribution. Commented Jan 9, 2024 at 22:22
  • Thanks. I understand round robin mechanism which of course wouldn't guarantee the message sequencing. When you say explicit partition identifier, do you mean assigning partition id instead of partition key? If yes, the documentation however doesn't recommend this approach. Commented Jan 10, 2024 at 2:07
  • It's a valid approach, but one that we consider a more advanced scenario. Assigning a partition identifier directly allows the service operation to bypass the Event Hubs gateway service and interact directly with the partition node. As a result, it has a slightly higher chance of intermittent failure if a node becomes momentarily unavailable due to crash recovery or normal service rebalancing. Commented Jan 10, 2024 at 14:06
  • Makes sense and much thanks. I had thought of using this approach but was reluctant going forward due to the reason I mentioned earlier that MS doesn't recommend it. Now will look into it more. However, I still feel that the original problem which was making use of the partition key resulting in not all the partitions utilization still persists. Especially in cases where all the partition keys are not too different. Shouldn't this problem be addressed? Commented Jan 10, 2024 at 14:32
  • Unfortunately, there are no guarantees when it comes to distribution with hashes. That said, there are known inefficiencies with the customized implementation the service uses today and they're working on introducing an alternate implementation based on a standard algorithm, which would be opt-in when creating the resource. Commented Jan 10, 2024 at 14:58

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.