1

I have a spark structured streaming job in scala, reading from kafka and writing to S3 as hudi tables. Now I am trying to move this job to spark operator on EKS.

When I give the option in the yaml file.

spark.jars.packages: org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2,org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1

But still I get the error at both the driver and executor

java.lang.ClassNotFoundException: org.apache.spark.sql.kafka010.KafkaBatchInputPartition .

How to add the package org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2, so it works.

Edit: Seems it is an existing issue fixed only in yet to be released version spark 3.4. Based on the suggestions here and here I had to bake all the jars (spark-sql-kafka-0-10_2.12-3.1.2 and its dependencies and also hudi jar) into the spark image. Then it worked.

3
  • What version of Spark are you using? Make sure you use the same version for the spark-sql-kafka-0-10 JAR Commented Feb 9, 2023 at 15:57
  • The version on spark is also 3.1.2 Commented Feb 9, 2023 at 17:53
  • Rather than in the YAML... In the code, can you put your spark.jars.packages there? Example - github.com/OneCricketeer/docker-stacks/blob/master/hadoop-spark/… Commented Feb 9, 2023 at 21:29

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.