I have data in a DataFrame with below columns
- Fileformat is csv
All below column datatypes are String
employeeid,pexpense,cexpense
Now I need to create a new DataFrame which has new column called expense, which is calculated based on columns pexpense, cexpense.
The tricky part is the calculation algorithm is not an UDF function which I created, but it's an external function that needs to be imported from a Java library which takes primitive types as arguments - in this case pexpense, cexpense - to calculate the value required for new column.
The function signature which is from an external Java jar
public class MyJava
{
public Double calculateExpense(Double pexpense, Double cexpense) {
// calculation
}
}
So how can I invoke that external function to create a new calculated column. Can I register that external function as UDF in my Spark application?