0

Is it possible to set field on the UDF instance in driver and use it by the executors when call() is invoked?

public class SomeUDF implements UDF2<String, String, String> {
    private String val = "foo";

    public void init(String st){
        val = st;
    }
    @Override
    public String call(String a, String b) {
        return val;
    }
}

py spark:

jvm_udf = spark._jvm.com.example.demo.SomeUDF()
jvm_udf.init("bla")

spark.udf.registerJavaFunction("foo", jvm_udf.getClass().getName(), T.StringType())
df_single_row.withColumn("val", expr(f"foo('a','b')"))

output is:

"foo"

I want:

"bla"

2
  • can you please state what the actual use case you have is? UDF objects are unique per object, there is no "shared memory" Commented Jan 27 at 12:18
  • sorry - unique per executor / partition - not per object. There is no UDF connectivity between the driver and executors after the udf is published Commented Jan 27 at 14:36

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.