Using a UDF with a REQUIRED column is creating a nullable column in my Spark Dataframe

Question

I am trying to create a Dataframe to write to a Big Query table. One column in the Output table is a REQUIRED ID that I need to generate in my pipeline. I am doing this with the use of a UDF but no matter what I try the column is being created as nullable.

How I've created the UDF:

UserDefinedFunction genID = functions.udf(
                (UDF1<String, String>) this::generateEmailCommID, DataTypes.StringType);

The method the UDF calls itself:

 private String generateEmailID(String srcId) {
        return UUID.nameUUIDFromBytes(("1_" + srcId).getBytes()).toString();
    }

And then I use this on my temp view transformedData like this:

spark.sql("SELECT message_ID AS src_id FROM transformedData")
          .withColumn(email_id, genID.apply(functions.col("src_id")))

This column needs to be REQUIRED to match the output table and column "src_id: is 'nullable=false'. So why does "email_id" get created "nullable=true" and how can I stop that from happening so I can write to the table?

root
 |-- email_id: string (nullable = true)
 |-- src_id: string (nullable = false)

ZygD · Accepted Answer · 2022-10-20 18:16:10Z

3

That's probably how udf works. I would assume that Spark doesn't know what udf can return, so to be on a safe side, it makes the column nullable.

If you are sure you don't have nulls in the column, you can add coalesce("col_name", lit("")). I mean, depending on what import you have, you can use either

.withColumn("email_id", coalesce("email_id", lit("")))

or

.withColumn("email_id", functions.coalesce("email_id", functions.lit("")))

edited Oct 20, 2022 at 18:16

answered Oct 20, 2022 at 18:10

ZygD

24.8k41 gold badges107 silver badges144 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Using a UDF with a REQUIRED column is creating a nullable column in my Spark Dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related