2

I need to call a function from my spark sql queries. I have tried udf but I don't know how to manipulate it. Here is the scenario:

# my python function example

def sum(effdate, trandate):
  sum=effdate+trandate
  return sum

and my spark sql query is like:

spark.sql("select sum(cm.effdate, cm.trandate)as totalsum, name from CMLEdG cm ....").show()

These lines are not my code but I am stating it as an example. How could I call my sum function inside spark.sql(sql queries) for getting a result? Could you please kindly suggest me any link or any comment compatible with pyspark?

Any help would be appreciated.

Thanks

Kalyan

2 Answers 2

5

You just need to register your function as UDF:

from spark.sql.types import IntegerType()

# my python function example
def sum(effdate, trandate):
  sum=effdate+trandate
  return sum

spark.udf("sum", sum, IntegerType())
spark.sql("select sum(cm.effdate, cm.trandate)as totalsum, name from CMLEdG cm....").show()
Sign up to request clarification or add additional context in comments.

1 Comment

spark.udf.register() worked for me (pyspark v3.3.1).
3

Check this

    >>> from pyspark.sql.types import IntegerType
    >>> sqlContext.udf.register("stringLengthInt", lambda x: len(x), IntegerType())
    >>> sqlContext.sql("SELECT stringLengthInt('test')").collect()
    [Row(_c0=4)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.