2

It's possible to register UDF with code, in the context, before using sql api. Spark proposes a command line tool: spark-sql to submit some SQL requests.

This tool use spark-submit with --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.

It's not possible to register a UDF before using spark-sql, but it's possible to add some jar or py-files.

What are the ways to use spark-sql with some registered functions?

1 Answer 1

1

In general, JVM (Java / Scala) classes implementing the following interfaces:

  • org.apache.hadoop.hive.ql.exec.{UDF, UDAF}
  • org.apache.hadoop.hive.ql.udf.generic.{AbstractGenericUDAFResolver, GenericUDF, GenericUDTF}
  • org.apache.spark.sql.expressions.UserDefinedAggregateFunction

can be registered using CREATE FUNCTION interface. Permanent functions will be available for all sessions.

There is no such capability for other UDF variants, including Python UDFs, ATM.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.