2

I have a function that adds 2 columns:

def sum_num (num1: Int, num2: Int): Int = {
    return num1 + num2
}

I have a dataframe df with below values

+----+----+----+
|col1|col2|col3|
+----+----+----+
|1   |2   |5   |
|7   |4   |4   |
+----+----+----+

I want to add a column and pass column names to the function but the below code is not working. It gives error found Column required is Int

val newdf = df.withColumn("sum_of_cols1", sum_num($col1, $ col2))
              .withColumn("sum_of_cols2", sum_num($col1, $ col3))
2
  • 1
    Does this help? Commented Mar 9, 2021 at 21:03
  • @GuruStron I had seen this but not sure how to create the udf using multiple columns. Also i had read that udf might have some performance implications and since I am doing the calculation on billion records trying to find other solutions! Commented Mar 10, 2021 at 6:51

1 Answer 1

4

Change your code to:

import spark.implicits._

def sum_num (num1: Column, num2: Column): Column = {
  return num1 + num2
}

val newdf = df.withColumn("sum_of_cols1", sum_num($"col1", $"col2"))
  .withColumn("sum_of_cols2", sum_num($"col1", $"col3"))

You must operate over Spark SQL columns. You can do arithmetic operations with them. Take a look to the operators that can be used

Sign up to request clarification or add additional context in comments.

2 Comments

thanks for this. for function sum_num does the output need to be a column or can i out Int there since I need that value to do some other calculation
you can operate with column directly, using the available operators. Don´t worry about that, if you need other specific functions you could add an udf that takes as input a column too.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.