1

I have written a spark UDF in JAVA to encrypt particular columns in a dataframe. It is type 1 UDF and only accepts a string that needs to be encrypted or decrypted at a time. I want to pass corresponding password as well. I tried the currying approach but was not able to write the function properly. Can anyone suggest me any solution ?

public class EncryptString implements UDF1<String, String> {


@Override
public String call(String s) throws Exception {
    return Aes256.encrypt(s);  
    //Aes.encrypt needs to have another variable password.
    //So that while calling the UDF we can pass the required password.
}
}

1 Answer 1

2

You can pass the password - as well as any other parameters - as constructor parameter to the EncryptString class:

public static class EncryptString implements UDF1<String, String> {

    private final String password; 

    public EncryptString(String password) {
        this.password = password;
    }

    public String call(String s) throws Exception { 
        return Aes256.encrypt(s, password);
    }
}

When instantiating the udf, you can pass the actual password:

spark.sqlContext().udf().register("EncryptUdf", new EncryptString("secret"), DataTypes.StringType);
[...]
spark.sql("select EncryptUdf(_c2) from df").show();
Sign up to request clarification or add additional context in comments.

3 Comments

And what to do if it is dynamic parameter, which is not column? Provided example will work for constant "secret" only. Should I register new function for each imaginable parameter value?
@FuadEfendi at the point in your code when you register the UDF, you should know your password. You don't have to pass in a fixed string into the constructor, you can use a variable. However, if the password is not known at this point in time or if the password is not the same for all rows, then this solution will not work for you. Probably you will need to formulate another question.
thank you for responding; after a lot of digging for Spark SQL UDF, including examples of Scala currying, I started to realize that there is no way to explicitly pass dynamic parameter to UDF except "org.apache.spark.sql.functions.col" (which will be super slow if values of column are dynamic). And, of course, UDF1 implementation may call external service to find unique "secret" for each user, but we cannot call it "passing extra variables in Spark UDF" neither; parameters must be columns (such as functions.col("MySecret"))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.