spark streaming from kafka on spark operator(Kubernetes)

I have a spark structured streaming job in scala, reading from kafka and writing to S3 as hudi tables. Now I am trying to move this job to spark operator on EKS.

When I give the option in the yaml file.

spark.jars.packages: org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2,org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.1

But still I get the error at both the driver and executor

java.lang.ClassNotFoundException: org.apache.spark.sql.kafka010.KafkaBatchInputPartition .

How to add the package org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2, so it works.

Edit: Seems it is an existing issue fixed only in yet to be released version spark 3.4. Based on the suggestions here and here I had to bake all the jars (spark-sql-kafka-0-10_2.12-3.1.2 and its dependencies and also hudi jar) into the spark image. Then it worked.

edited Feb 17, 2023 at 10:09

asked Feb 9, 2023 at 15:19

haripriya rajendran

213 bronze badges

What version of Spark are you using? Make sure you use the same version for the spark-sql-kafka-0-10 JAR

OneCricketeer
– OneCricketeer

2023-02-09 15:57:09 +00:00
Commented Feb 9, 2023 at 15:57
The version on spark is also 3.1.2

haripriya rajendran
– haripriya rajendran

2023-02-09 17:53:11 +00:00
Commented Feb 9, 2023 at 17:53
Rather than in the YAML... In the code, can you put your spark.jars.packages there? Example - github.com/OneCricketeer/docker-stacks/blob/master/hadoop-spark/…

OneCricketeer
– OneCricketeer

2023-02-09 21:29:45 +00:00
Commented Feb 9, 2023 at 21:29

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

spark streaming from kafka on spark operator(Kubernetes)

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest