0

I am running an Apache Spark job on Amazon EMR that needs to connect to an Amazon MSK cluster configured with IAM authentication. The EMR cluster has an IAM role with full MSK permissions, and I can successfully access MSK bootstrap brokers via telnet and using Python Kafka clients with the same permissions.

However, when running my Spark Structured Streaming job on EMR, it fails with the error:

java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeTopics

My Spark submit command includes all the necessary Kafka and AWS MSK IAM authentication jars, specifically:

spark-sql-kafka-0-10_2.12-3.5.1.jar
kafka-clients-3.5.1.jar
spark-token-provider-kafka-0-10_2.12-3.5.6.jar

EMR Version : emr-7.2.0 MSK Version : 3.6.0

The Spark streaming read is configured as follows:

python
spark.readStream.format("kafka") \
  .option("kafka.bootstrap.servers", "<broker1:9098,broker2:9098,...>") \
  .option("subscribe", "my_topic") \
  .option("kafka.security.protocol", "SASL_SSL") \
  .option("kafka.sasl.mechanism", "AWS_MSK_IAM") \
  .option("kafka.sasl.jaas.config", "software.amazon.msk.auth.iam.IAMLoginModule required;") \
  .option("kafka.sasl.client.callback.handler.class", "software.amazon.msk.auth.iam.IAMClientCallbackHandler") \
  .load()

I have verified:

  1. EMR IAM role has required MSK permissions (Connect, DescribeCluster, DescribeTopic, etc.)
  2. Network connectivity to MSK brokers on port 9098 (SASL_SSL)
  3. Using compatible versions of Kafka client and IAM auth jars

I do NOT want to manually manage or distribute custom truststore files, as I expected the EMR JVM to trust MSK's default certificates automatically.

What could be the cause of the TimeoutException waiting for node assignment from Kafka when all connectivity checks pass and IAM permissions are verified?

Are there any best practices or additional configurations needed specifically on EMR or Spark to authenticate successfully with MSK using IAM?

Any guidance or examples of a working Spark + MSK IAM auth setup on EMR would be highly appreciated!

Thank you.

1
  • You should use all 3.6.0 kafka jars Commented Oct 1 at 21:56

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.