I am in a bind here. I am trying to implement a very basic pipeline which reads data from kafka and process it in Spark. The problem I am facing is that apache spark shuts down abruptly giving the aforesaid error message. My pyspark version is 3.5.1 and scala version is 2.12.18.
The code in question is :-
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder \
.appName('my_app') \
.config("spark.jars", "/usr/local/spark/jars/spark-sql-kafka-0-10_2.12-3.5.1.jar") \
.getOrCreate()
df = spark.readStream \
.format('kafka') \
.option("kafka.bootstrap.servers", "localhost:9092") \
.option("subscribe", "quickstart-events") \
.option("startingOffsets", "earliest") \
.load()
query = df.writeStream \
.trigger(processingTime='5 seconds') \
.outputMode("update") \
.format("console") \
.start()
query.awaitTermination()
I have downloaded all the necessary jar files and placed them in the appropriate directory in Spark. I am able to read messages from the kafka broker as a customer so the possibility of my kafka installation being out of order is ruled out. Any help will be tremendously appreciated.
_2.12jar but maybe there is some other_2.13jar on the classpath. You should investigate your classpath.scala-libraryis on classpath (scala.collection...method is from there). General reasons forNoSuchMethodErrorare stackoverflow.com/questions/35186/… stackoverflow.com/questions/27938776/…