13

From what I have seen, in order to do this you have to

  1. make the udf as a plain function
  2. register the function with SQLContext for SQL

    spark.sqlContext.udf.register("myUDF", myFunc)
    
  3. turn this into a UserDefinedFunction for DataFrame

    def myUDF = udf(myFunc)
    

Is there no way to combine this into one step and make the udf available for both? Also, for cases where a function exists for DataFrame but not for SQL, how do you go about registering it without copying over the code again?

0

3 Answers 3

17

UDFRegistration.register variants, which take a scala.FunctionN, return an UserDefinedFunction so you can register SQL function and create DSL friendly UDF in a single step:

val timesTwoUDF = spark.udf.register("timesTwo", (x: Int) => x * 2)
spark.sql("SELECT timesTwo(1)").show
+---------------+
|UDF:timesTwo(1)|
+---------------+
|              2|
+---------------+
spark.range(1, 2).toDF("x").select(timesTwoUDF($"x")).show
+------+
|UDF(x)|
+------+
|     2|
+------+
Sign up to request clarification or add additional context in comments.

1 Comment

DSL: Domain Specific Language
10

You can use the following and still apply it on dataframe

spark.sqlContext.udf.register("myUDF", myFunc)

Use selectExpr when calling it on dataframe transformations.

df.selectExpr("myUDF(col1) as modified_col1")

Comments

4

Update for Spark2-

spark.udf.register("func_name", func_name)

Argument1- Function name it will be register in spark

Argument2- Function name what is defined while creating in python/scala

It's best practice to register the function with same name in spark.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.