2

I want to share udfs I created in Scala with other cluster which our data scientist use with pyspark and jupyter in EMR clusters.

Is this possible? How?

6
  • Possible duplicate of Using a Scala UDF in PySpark Commented Jul 3, 2017 at 9:44
  • @zeapo Don't think so as it's about sharing UDFs in Jupyter across EMR cluster that could give a feature like this. It's not possible in Spark directly unless people use shared SparkSession in Spark Thrift Server though. Commented Jul 3, 2017 at 9:46
  • It's not, because I want to be able to share existing function and add them to the spark catalog, instead of recreate them every time Commented Jul 3, 2017 at 9:47
  • Do you want to share the same UDF across different EMR clusters (which I believe are therefore different SparkContexts)? Unless EMR somehow gives you the UDF sharing feature it's not possible in Spark SQL. Commented Jul 3, 2017 at 9:50
  • Isn't there something similar to a shared hive metastore? Or add something to spark-default.conf file? Commented Jul 3, 2017 at 9:51

1 Answer 1

1

this answer indeed helps

create an uber jar, put in s3, on bootstrap action copt it from s3 to spark local jar folder and it should work

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.