build-on-aws
diff --git a/‎README.md‎
Lines changed: 4 additions & 6 deletions b/‎README.md‎
Lines changed: 4 additions & 6 deletions
diff --git a/‎docker-compose.yml‎
Lines changed: 30 additions & 0 deletions b/‎docker-compose.yml‎
Lines changed: 30 additions & 0 deletions
diff --git a/‎pom.xml‎
Lines changed: 26 additions & 37 deletions b/‎pom.xml‎
Lines changed: 26 additions & 37 deletions
diff --git a/‎scripts/workaround.sh‎
Lines changed: 19 additions & 0 deletions b/‎scripts/workaround.sh‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎src/main/java/blog/buildon/aws/streaming/kafka/AllOrdersConsumer.java‎
Lines changed: 76 additions & 0 deletions b/‎src/main/java/blog/buildon/aws/streaming/kafka/AllOrdersConsumer.java‎
Lines changed: 76 additions & 0 deletions
diff --git a/‎src/main/java/blog/buildon/aws/streaming/kafka/AllOrdersProducer.java‎
Lines changed: 62 additions & 0 deletions b/‎src/main/java/blog/buildon/aws/streaming/kafka/AllOrdersProducer.java‎
Lines changed: 62 additions & 0 deletions
@@ -1,10 +1,8 @@
 # Prioritizing Event Processing with Apache Kafka
 
-Implement message prioritization in [Apache Kafka](https://kafka.apache.org) is often a hard task because Kafka doesn't support broker-level reordering of messages like some messaging technologies do. Though some developers see this as a limitation, the reality is that it isn't because Kafka is not supposed to allow message reordering. Kafka is a distributed [commit log](https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying) and therefore messages are immutable and so their ordering is within partitions. This doesn't change the fact the developers may need to implement message prioritization in Kafka.
+Implement event processing prioritization in [Apache Kafka](https://kafka.apache.org) is often a hard task because Kafka doesn't support broker-level reordering of messages like some messaging technologies do. Though some developers see this as a limitation, the reality is that it isn't because Kafka is not supposed to allow message reordering. Kafka is a distributed [commit log](https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying) and therefore messages are immutable and so their ordering is within partitions. This doesn't change the fact the developers may need to implement event processing prioritization in Kafka.
 
-This project aims to address this problem while still proving a way to keep the implementation code simple. In Kafka, [partitions are a unit-of-parallelism, unit-of-storage, and unit-of-durability](https://www.buildon.aws/posts/in-the-land-of-the-sizing-the-one-partition-kafka-topic-is-king/01-what-are-partitions). However, when developers write code to handle partitions directly they end up writing a rather more complex code, and often need to give up of some facilities that the Kafka architecture provides such as automatic rebalancing of consumers when new partitions are added and/or when a group leader fails. This becomes even more important when developers are interacting with Kafka via frameworks like [Kafka Connect](https://kafka.apache.org/documentation/#connect) and [Kafka Streams](https://kafka.apache.org/documentation/streams/) that, by design, don't expect that partitions are handled directly.
-
-This project addresses message prioritization by grouping partitions into simpler abstractions called buckets that express priority given their size. Bigger buckets mean a higher priority, and smaller buckets mean less priority. The project also addresses code simplicity by providing a way to do all of this with the pluggable architecture of Kafka.
+In Kafka, [partitions are a unit-of-parallelism, unit-of-storage, and unit-of-durability](https://www.buildon.aws/posts/in-the-land-of-the-sizing-the-one-partition-kafka-topic-is-king/01-what-are-partitions). However, when developers write code to handle partitions directly they end up writing a rather more complex code, and often need to give up of some facilities that the Kafka architecture provides such as automatic rebalancing of consumers when new partitions are added and/or when a group leader fails. This becomes even more important when developers are interacting with Kafka via frameworks like [Kafka Connect](https://kafka.apache.org/documentation/#connect) and [Kafka Streams](https://kafka.apache.org/documentation/streams/) that, by design, don't expect that partitions are handled directly. This project addresses event processing prioritization via the bucket pattern. It groups partitions into simpler abstractions called buckets that express priority given their size. Bigger buckets mean a higher priority, and smaller buckets mean less priority. The project also addresses code simplicity by providing a way to do all of this with the pluggable architecture of Kafka.
 
 Let's understand how this works with an example.
 
@@ -16,7 +14,7 @@ To ensure that each message will end up in their respective bucket, use the `Buc
 
 ![Assignor Overview](images/assignor-overview.png)
 
-With the bucket priority, you can implement message prioritization by having more consumers working on buckets with higher priorities, while buckets with less priority can have fewer consumers. Message prioritization can also be obtained by executing these consumers in an order that gives preference to processing high priority buckets before the less priority ones. While coordinating this execution might involve some extra coding from your part (perhaps using some sort of scheduler) you don't have to implement low-level code to manage partition assignment and keep your consumers simple by leveraging the standard `subscribe()` and `poll()` methods.
+With the bucket priority, you can implement event processing prioritization by having more consumers working on buckets with higher priorities, while buckets with less priority can have fewer consumers. Event processing prioritization can also be obtained by executing these consumers in an order that gives preference to processing high priority buckets before the less priority ones. While coordinating this execution may involve some extra coding from you (perhaps using some sort of scheduler) you don't have to implement low-level code to manage partition assignment and keep your consumers simple by leveraging the standard `subscribe()` and `poll()` methods.
 
 ## Building the project
 
@@ -80,7 +78,7 @@ Discarding any message that can't be sent to any of the buckets is also possible
 
 ```bash
 configs.setProperty(BucketPriorityConfig.FALLBACK_PARTITIONER_CONFIG,
-   "blog.buildon.aws.streaming.kafka.DiscardPartitioner");
+   "code.buildon.aws.streaming.kafka.DiscardPartitioner");
 ```
 
 ## Using the assignor
 
@@ -0,0 +1,30 @@
+services:
+
+  kafka:
+    image: confluentinc/cp-kafka:7.3.2
+    hostname: kafka
+    container_name: kafka
+    ports:
+      - "9092:9092"
+    environment:
+      KAFKA_NODE_ID: 1
+      KAFKA_BROKER_ID: 1
+      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT'
+      KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092'
+      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
+      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
+      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
+      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
+      KAFKA_PROCESS_ROLES: 'broker,controller'
+      KAFKA_CONTROLLER_QUORUM_VOTERS: '1@kafka:29093'
+      KAFKA_LISTENERS: 'PLAINTEXT://kafka:29092,CONTROLLER://kafka:29093,PLAINTEXT_HOST://0.0.0.0:9092'
+      KAFKA_INTER_BROKER_LISTENER_NAME: 'PLAINTEXT'
+      KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
+      KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
+    volumes:
+      - ./scripts/workaround.sh:/tmp/workaround.sh
+    command: "bash -c '/tmp/workaround.sh && /etc/confluent/docker/run'"
+    healthcheck:
+      test: echo srvr | nc kafka 9092 || exit 1
+      interval: 5s
+      retries: 10
@@ -1,13 +1,13 @@
-<?xml version="1.0" ?>
+<?xml version="1.0"?>
 
-<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
-    http://maven.apache.org/xsd/maven-4.0.0.xsd"
+<project
+    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"
     xmlns="http://maven.apache.org/POM/4.0.0"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 
     <modelVersion>4.0.0</modelVersion>
 
-    <groupId>blog.buildon.aws.streaming.kafka</groupId>
+    <groupId>code.buildon.aws.streaming.kafka</groupId>
     <artifactId>bucket-priority-pattern</artifactId>
     <description>Pattern that groups topic partitions into buckets so these buckets can be processed in a given priority order.</description>
     <version>1.0.0</version>
@@ -25,6 +25,7 @@
 
     <properties>
         <kafka.clients.version>3.4.1</kafka.clients.version>
+        <slf4j.api.version>2.0.7</slf4j.api.version>
         <junit.jupiter.version>5.9.3</junit.jupiter.version>
     </properties>
 
@@ -33,7 +34,11 @@
             <groupId>org.apache.kafka</groupId>
             <artifactId>kafka-clients</artifactId>
             <version>${kafka.clients.version}</version>
-            <scope>provided</scope>
+        </dependency>
+        <dependency>
+            <groupId>org.slf4j</groupId>
+            <artifactId>slf4j-api</artifactId>
+            <version>${slf4j.api.version}</version>
         </dependency>
         <dependency>
             <groupId>org.junit.jupiter</groupId>
@@ -49,45 +54,29 @@
                 <groupId>org.apache.maven.plugins</groupId>
                 <artifactId>maven-compiler-plugin</artifactId>
                 <version>3.11.0</version>
-                <inherited>true</inherited>
                 <configuration>
-                    <release>11</release>
+                    <release>17</release>
                 </configuration>
             </plugin>
             <plugin>
                 <groupId>org.apache.maven.plugins</groupId>
-                <artifactId>maven-source-plugin</artifactId>
-                <version>3.3.0</version>
-                <executions>
-                  <execution>
-                    <id>attach-sources</id>
-                    <goals>
-                      <goal>jar</goal>
-                    </goals>
-                  </execution>
-                </executions>
-            </plugin>
-            <plugin>
-                <groupId>org.apache.maven.plugins</groupId>
-                <artifactId>maven-surefire-plugin</artifactId>
-                <version>3.1.2</version>
+                <artifactId>maven-assembly-plugin</artifactId>
+                <version>3.6.0</version>
                 <configuration>
-                    <argLine>
-                        --illegal-access=permit
-                    </argLine>
-                </configuration>
-            </plugin>
-            <plugin>
-                <groupId>org.apache.maven.plugins</groupId>
-                <artifactId>maven-failsafe-plugin</artifactId>
-                <version>3.1.2</version>
-                <configuration>
-                    <argLine>
-                        --illegal-access=permit
-                    </argLine>
+                    <descriptorRefs>
+                        <descriptorRef>jar-with-dependencies</descriptorRef>
+                    </descriptorRefs>
                 </configuration>
+                <executions>
+                    <execution>
+                        <id>make-assembly</id>
+                        <phase>package</phase>
+                        <goals>
+                            <goal>single</goal>
+                        </goals>
+                    </execution>
+                </executions>
             </plugin>
         </plugins>
     </build>
-
-</project>
+</project>
@@ -0,0 +1,19 @@
+#!/bin/sh
+
+##########################################################################
+################################ Important ###############################
+##########################################################################
+##  This script implements workarounds for the current Docker image of  ##
+##  Apache Kafka from Confluent. Eventually, newer images will fix the  ##
+##  issues found here, and this script will no longer be required.      ##
+##########################################################################
+
+# Workaround: Remove check for KAFKA_ZOOKEEPER_CONNECT parameter
+sed -i '/KAFKA_ZOOKEEPER_CONNECT/d' /etc/confluent/docker/configure
+
+# Workaround: Ignore cub zk-ready
+sed -i 's/cub zk-ready/echo ignore zk-ready/' /etc/confluent/docker/ensure
+
+# KRaft required: Format the storage directory with a new cluster ID
+export KAFKA_CLUSTER_ID="p8fFEbKGQ22B6M_Da_vCBw"
+echo "kafka-storage format --ignore-formatted -t $KAFKA_CLUSTER_ID -c /etc/kafka/kafka.properties" >> /etc/confluent/docker/ensure
@@ -0,0 +1,76 @@
+package blog.buildon.aws.streaming.kafka;
+
+import java.time.Duration;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Properties;
+
+import org.apache.kafka.clients.consumer.ConsumerConfig;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.clients.consumer.ConsumerRecords;
+import org.apache.kafka.clients.consumer.KafkaConsumer;
+import org.apache.kafka.common.serialization.StringDeserializer;
+
+import static blog.buildon.aws.streaming.kafka.utils.KafkaUtils.ALL_ORDERS;
+import static blog.buildon.aws.streaming.kafka.utils.KafkaUtils.createTopic;
+import static blog.buildon.aws.streaming.kafka.utils.KafkaUtils.getConfigs;
+
+public class AllOrdersConsumer {
+
+    private class ConsumerThread extends Thread {
+
+        private String threadName;
+        private KafkaConsumer<String, String> consumer;
+
+        public ConsumerThread(String threadName, Properties configs) {
+
+            this.threadName = threadName;
+
+            configs.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
+                StringDeserializer.class.getName());
+
+            configs.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
+                StringDeserializer.class.getName());
+
+            configs.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
+            configs.setProperty(ConsumerConfig.GROUP_ID_CONFIG, ALL_ORDERS + "-group");
+
+            consumer = new KafkaConsumer<>(configs);
+            consumer.subscribe(Arrays.asList(ALL_ORDERS));
+
+        }
+
+        @Override
+        public void run() {
+            for (;;) {
+                ConsumerRecords<String, String> records =
+                    consumer.poll(Duration.ofSeconds(Integer.MAX_VALUE));
+                for (ConsumerRecord<String, String> record : records) {
+                    System.out.println(String.format("[%s] Key = %s, Partition = %d",
+                        threadName, record.key(), record.partition()));
+                }
+            }
+        }
+
+    }
+
+    private final List<ConsumerThread> consumerThreads = new ArrayList<>();
+
+    private void run(int numberOfThreads, Properties configs) {
+        for (int i = 0; i < numberOfThreads; i++) {
+            String threadName = String.format("Consumer-Thread-%d", i);
+            consumerThreads.add(new ConsumerThread(threadName, configs));
+        }
+        consumerThreads.stream().forEach(ct -> ct.start());
+    }
+
+    public static void main(String[] args) {
+        createTopic(ALL_ORDERS, 6, (short) 3);
+        if (args.length >= 1) {
+            int numberOfThreads = Integer.parseInt(args[0]);
+            new AllOrdersConsumer().run(numberOfThreads, getConfigs());
+        }
+    }
+
+}
@@ -0,0 +1,62 @@
+package blog.buildon.aws.streaming.kafka;
+
+import java.util.Properties;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import org.apache.kafka.clients.producer.KafkaProducer;
+import org.apache.kafka.clients.producer.ProducerConfig;
+import org.apache.kafka.clients.producer.ProducerRecord;
+import org.apache.kafka.common.serialization.StringSerializer;
+import org.apache.kafka.common.utils.Utils;
+
+import static blog.buildon.aws.streaming.kafka.utils.KafkaUtils.ALL_ORDERS;
+import static blog.buildon.aws.streaming.kafka.utils.KafkaUtils.createTopic;
+import static blog.buildon.aws.streaming.kafka.utils.KafkaUtils.getConfigs;
+
+public class AllOrdersProducer {
+
+    private void run(Properties configs) {
+
+        configs.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
+            StringSerializer.class.getName());
+
+        configs.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
+            StringSerializer.class.getName());
+
+        try (KafkaProducer<String, String> producer = new KafkaProducer<>(configs)) {
+
+            AtomicInteger counter = new AtomicInteger(0);
+            String[] buckets = {"Platinum", "Gold"};
+
+            for (;;) {
+
+                int value = counter.incrementAndGet();
+                int index = Utils.toPositive(value) % buckets.length;
+                String recordKey = buckets[index] + "-" + value;
+
+                ProducerRecord<String, String> record =
+                    new ProducerRecord<>(ALL_ORDERS, recordKey, "Value");
+
+                producer.send(record, (metadata, exception) -> {
+                    System.out.println(String.format(
+                        "Record with key '%s' was sent to partition %d",
+                        recordKey, metadata.partition()));
+                });
+
+                try {
+                    Thread.sleep(1000);
+                } catch (InterruptedException ie) {
+                }
+
+            }
+
+        }
+
+    }
+
+    public static void main(String[] args) {
+        createTopic(ALL_ORDERS, 6, (short) 3);
+        new AllOrdersProducer().run(getConfigs());
+    }
+
+}