1

I'm new to spark, and yet to write my first spark application and still investigating whether that would be a good fit for our purpose. Currently just trying to run the sample example that come with spark that access kafka

I tried to run kafka examples coming out of the box using two ways without success with same error.

  1. from spark using helm/kubernetes
  2. from manual local build

I search existing post but don't quite understand why the out of box don't seem to be working.

Spark fails with NoClassDefFoundError for org.apache.kafka.common.serialization.StringDeserializer

Apache Kafka: ...StringDeserializer is not an instance of ...Deserializer

Why does Spark application fail with "Exception in thread "main" java.lang.NoClassDefFoundError: ...StringDeserializer"?

HELM/Kubernetes

Clone https://github.com/bitnami/charts.git bitnami/spark
using
registry: docker.io
  repository: bitnami/spark
  tag: 2.4.5-debian-10-r87
  tag: 2.4.5-debian-10-r94
Got success with ./bin/run-example SparkPi 10
But got error with ./bin/run-example streaming.JavaDirectKafkaWordCount myBroker myConsumerGroup myTopic

    INFO StreamingExamples: Setting log level to [WARN] for streaming example. To override add a custom log4j.properties to the classpath.
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/StringDeserializer
        at org.apache.spark.examples.streaming.JavaDirectKafkaWordCount.main(JavaDirectKafkaWordCount.java:78)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.StringDeserializer
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 13 more

MANUAL LOCAL BUILD

Clone https://github.com/apache/spark.git
./build/mvn -DskipTests clean package
[INFO] BUILD SUCCESS

RAN EXAMPLE SUCCESSFULLY
./bin/run-example SparkPi 10
Pi is roughly 3.1424111424111425

RAN KAFKA EXAMPLE WITH ClassNotFoundException
./bin/run-example streaming.JavaDirectKafkaWordCount myBroker myConsumerGroup myTopic

    INFO StreamingExamples: Setting log level to [WARN] for streaming example. To override add a custom log4j.properties to the classpath.
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/StringDeserializer
        at org.apache.spark.examples.streaming.JavaDirectKafkaWordCount.main(JavaDirectKafkaWordCount.java:78)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:934)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.StringDeserializer
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 13 more

1 Answer 1

1

I'm not sure run-example sets up the classpath correctly for external libraries.

You need kafka-clients on the classpath (which should be included as part of spark-sql-kafka-0-10, which is not provided by Spark by default, so you must download it, and add it to the Spark libs directory).

Or you can use spark-submit --packages, as is documented by Spark on submitting applications

Sign up to request clarification or add additional context in comments.

2 Comments

i'm getting the same error on GCP DataProc .. i'm passing the jar but still getting the error, pls refer stackoverflow.com/questions/70951195/… .. any ideas on this ?
@Karan What don't you understand? Your post didn't include kafka-clients.jar

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.