1

I am trying to connect to an elasticsearch database using spark and my code snippet looks like this:

spark = SparkSession.builder.master("local").appName("Spark").getOrCreate()
reader = spark.read.format("org.elasticsearch.spark.sql").option("es.read.metadata", "true").option("es.nodes.wan.only","true").option("es.port","9200").option("es.net.ssl","false").option("es.nodes", "here-ip-adress")
df = reader.load("my_index")

When calling df = reader.load("my_index") I get the following error:

py4j.protocol.Py4JJavaError: An error occurred while calling o45.load.: java.lang.NoClassDefFoundError: scala/Product$class
    at org.elasticsearch.spark.sql.ElasticsearchRelation.<init>(DefaultSource.scala:191)
    at org.elasticsearch.spark.sql.DefaultSource.createRelation(DefaultSource.scala:93)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
    at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.ClassNotFoundException: scala.Product$class
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 20 more

There are some other solutions here on stackoverflow, but for some reason none of them have helped. I am using Spark version 3.2.0, Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_312) and running the code with spark-submit --packages org.elasticsearch:elasticsearch-hadoop:6.1.0

Thanks in advance!

2
  • can you show us your build.sbt if you use sbt? Commented Feb 11, 2022 at 8:31
  • I am not using sbt :( I am actually using python and importing the elasticsearc-hadoop package from maven as JARS as this is the only alternative I found to use elasticsearch hadoop with python @DariuszKrynicki Commented Feb 11, 2022 at 8:40

2 Answers 2

3

For anyone coming here in 2023,

I had same error with ElasticSearch version 8.6.1, PySpark 3.1.2 and ElasticSearch official JARs. Official JARs only include elasticsearch-spark-20_2.11-8.6.1 with support for scala 2.11.

Solution: Instead, I downloaded the correct JAR with scala 2.12 and everything runs smoothly ...

pyspark --jars elasticsearch-spark-30_2.12-8.6.1.jar

>>> reader = spark.read.format('org.elasticsearch.spark.sql').option('es.nodes', 'ES_IP').option('es.port', 9200)
>>> read_df = reader.load('ES_INDEX')

Sign up to request clarification or add additional context in comments.

Comments

2

It looks like you are using different Scala version binaries.

please check again you imported spark compiled with the right Scala version.

according to maven dependencies, https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch-hadoop/6.1.0

elasticsearch-hadoop:6.1.0 depends on the spark compiled with Scala 2.11, so you should choose a compatible spark version.

4 Comments

If I am running spark-shell it says I am using Scala 2.12.15, so it should be fine I guess, or am I missing something?
Might the issue be because of the elasticsearch-hadoop:6.1.0 package that I imported? I also tried with elasticsearch-hadoop:7.17.0
I think maybe you already point it, according to maven dependencies both elasticsearch-hadoop:6.1.0 and elasticsearch-hadoop:7.17.0 depend on spark compiled with scala2.11
Thank you! I downgraded to spark 2.4.7 and this solved the issue with Scala :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.