how to call an udf with multiple arguments(currying) in spark sql?

Question

How do i call the below UDF with multiple arguments(currying) in a spark dataframe as below.

read read and get a list[String]

val data = sc.textFile("file.csv").flatMap(line => line.split("\n")).collect.toList

register udf

val getValue = udf(Udfnc.getVal(_: Int, _: String, _: String)(_: List[String]))

call udf in the below df

df.withColumn("value",
     getValue(df("id"),
        df("string1"),
        df("string2"))).show()

Here is am missing the List[String] argument, and I am really not sure as how should i pass on this argument .

what is panel_df? and please post sample input and expected output and full code of udf function — Anahcolus
– Anahcolus, Commented Jun 18, 2018 at 5:30

m-bhole · Accepted Answer · 2018-06-18 11:29:43Z

5

I can make following assumption about your requirement based on your question

a] UDF should accept parameter other than dataframe column

b] UDF should take multiple columns as parameter

Let's say you want to concat values from all column along with specified parameter. Here is how you can do it

import org.apache.spark.sql.functions._

def uDF(strList: List[String]) = udf[String, Int, String, String]((value1: Int, value2: String, value3: String) => value1.toString + "_" + value2 + "_" + value3 + "_" + strList.mkString("_"))

val df = spark.sparkContext.parallelize(Seq((1,"r1c1","r1c2"),(2,"r2c1","r2c2"))).toDF("id","str1","str2")

scala> df.show
+---+----+----+
| id|str1|str2|
+---+----+----+
|  1|r1c1|r1c2|
|  2|r2c1|r2c2|
+---+----+----+

val dummyList = List("dummy1","dummy2")
val result = df.withColumn("new_col", uDF(dummyList)(df("id"),df("str1"),df("str2")))



   scala> result.show(2, false)
+---+----+----+-------------------------+
|id |str1|str2|new_col                  |
+---+----+----+-------------------------+
|1  |r1c1|r1c2|1_r1c1_r1c2_dummy1_dummy2|
|2  |r2c1|r2c2|2_r2c1_r2c2_dummy1_dummy2|
+---+----+----+-------------------------+

answered Jun 18, 2018 at 11:29

m-bhole

1,18910 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sapy Over a year ago

Can you please share java equivalent

Aleksejs R Over a year ago

Is it possible to register currying UDF with spark.udf.register ?

swapnil shashank · Accepted Answer · 2021-01-06 07:58:10Z

1

Defining a UDF with multiple parameters:

val enrichUDF: UserDefinedFunction = udf((jsonData: String, id: Long) => {

      val lastOccurence = jsonData.lastIndexOf('}')
      val sid = ",\"site_refresh_stats_id\":" + id+ " }]"
      val enrichedJson = jsonData.patch(lastOccurence, sid, sid.length)

      enrichedJson

    })

Calling the udf to an existing dataframe:

val enrichedDF = EXISTING_DF
  .withColumn("enriched_column",
    enrichUDF(col("jsonData")
      , col("id")))

An import statement is also required as:

import org.apache.spark.sql.expressions.UserDefinedFunction

answered Jan 6, 2021 at 7:58

swapnil shashank

1,0179 silver badges11 bronze badges

Collectives™ on Stack Overflow

how to call an udf with multiple arguments(currying) in spark sql?

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related