1

I have a schema and name of columns to apply UDF to. Name of columns are user input and they can differ in numbers for each input. Is there a way to apply UDFs to N columns in dataframe ?

Trying to achieve this. for schema with say col1,col2,col3,col4,col5

  DataFrame newDF = df.withColumn("col2", callUDF("test", (df.col("col2"))));
  or 
  DataFrame newDF = df.withColumn("col2", callUDF("test", (df.col("col2"))))
                 .withColumn("col3", callUDF("test", (df.col("col3"))));
  or
   DataFrame newDF = df.withColumn("col2", callUDF("test", (df.col("col1"))))
                 .withColumn("col3", callUDF("test", (df.col("col3"))))
                 .withColumn("col5", callUDF("test", (df.col("col5"))))
  or for N columns.

Any ideas ?

1

1 Answer 1

0

I ended up writing code to dynamically generate SPARK SQL query for applying UDFs to 1 to N cols. Then register input dataframe as temp table and use genererated query.

Sign up to request clarification or add additional context in comments.

1 Comment

Care to share the code? Or else your answer is of no use.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.