1

I’m designing a multi-datacenter architecture using Apache Pulsar with geo-replication enabled.

Architecture Overview:

  • Apache Pulsar version: 4.0.2
  • Helm Chart version: pulsar-3.9.0
  • BookKeeper: 5 replicas
  • Broker: 5 replicas
  • Proxy: 3 replicas
  • Zookeeper: 3 replicas
  • Recovery: 1 replica
  • Deployed on Kubernetes (Rancher)
  • Bookie storage: vSphere CSI via Persistent Volume Claims (PVC)

The architecture includes 2 different datacenters, and each of them hosts local consumers (around 200 per DC). My goal is to enable geo-replication between these DCs, but at the same time I must strictly prevent message duplication during consumption — messages should be processed exactly once, even in failover scenarios.

enter image description here

Requirements:

  • Messages must be durable (no data loss allowed)
  • Active-active or active-passive setup is acceptable
  • Each datacenter has its own Pulsar cluster and
  • Consumer duplication must not happen, even during failover or replay
  • Pulsar Deduplication and Failover Subscriptions are enabled

Questions:

  1. What is the best practice to ensure geo-replication between clusters without consumers processing the same message multiple times?
  2. Is it possible to achieve synchronous geo-replication via BookKeeper, writing each message to multiple DCs before ack?
  3. Would a combination of deduplication + idempotent consumer logic + failover subscription be enough?
  4. Any gotchas or caveats you’ve experienced with Pulsar multi-cluster deployments in this scenario?

Thanks in advance!

I deployed Apache Pulsar 4.0.2 using Helm chart pulsar-3.9.0 on a Kubernetes (Rancher) cluster across 2 datacenters. I enabled geo-replication, created separate clusters for each DC, and connected them via pulsar-admin CLI using the set-replication-clusters and set-clusters commands.

I also configured:

  • deduplication on the topics
  • failover subscription on the consumers
  • idempotent logic in the consumer code using message_id checking with Redis

What I expected:

  • Each message would be written once and consumed only once per global system
  • No duplicates during failover or network interruptions

What actually happened:

  • During failover testing, I noticed duplicate consumption in some edge cases (likely due to replay after disconnect)
  • I'm looking for a more reliable strategy to ensure exactly-once processing across DCs

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.