1

I'm getting error **EsHadoopIllegalArgumentException: Cannot detect ES version-typically this happens if the network/Elasticsearch cluster is not accessible or when targeting a WAN/Cloud instance without the proper setting 'es.nodes.wan.only' ** I'm using pyspark to write my dataframe to elasticsearch cluster like this:

df1.write.format("org.elasticsearch.spark.sql")\
    .option("es.nodes", host)\
    .option("es.port", port)\
    .option("es.net.http.auth.user", username)\
    .option("es.net.http.auth.pass", password)\
    .option("es.resource", indexName)\
    .option("es.net.ssl.keystore.location", pathToCAFile))\
    .mode('overwrite')\
    .save()

I've tried wan only options but they make no difference. I've checked my cluster connectivity using curl, it's working totally fine and I'm able to connect to elasticsearch server using python also but Pyspark is giving me hardtime here. I'm using the same version of EsHadoop-jar file also as per my elasticsearch cluster version.

Please help me with this issue.

1 Answer 1

0

You have to put the CA certificate in a JKS keystore with password, then reference it in es.net.ssl.trustore.location (and make sure you put the password in es.net.ssl.trustore.password).

You can use https://keystore-explorer.org/ to create the JKS keystore.

More details in this other answer which helped me get over this problem!

How to connect to Elasticsearch by PySpark without cert verification?

Also make sure to set es.net.ssl to "true" as by default is false!

And you might need to also enable es.nodes.wan.only depending on where your cluster is.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.