I am trying to create a Dataframe to write to a Big Query table. One column in the Output table is a REQUIRED ID that I need to generate in my pipeline. I am doing this with the use of a UDF but no matter what I try the column is being created as nullable.
How I've created the UDF:
UserDefinedFunction genID = functions.udf(
(UDF1<String, String>) this::generateEmailCommID, DataTypes.StringType);
The method the UDF calls itself:
private String generateEmailID(String srcId) {
return UUID.nameUUIDFromBytes(("1_" + srcId).getBytes()).toString();
}
And then I use this on my temp view transformedData like this:
spark.sql("SELECT message_ID AS src_id FROM transformedData")
.withColumn(email_id, genID.apply(functions.col("src_id")))
This column needs to be REQUIRED to match the output table and column "src_id: is 'nullable=false'. So why does "email_id" get created "nullable=true" and how can I stop that from happening so I can write to the table?
root
|-- email_id: string (nullable = true)
|-- src_id: string (nullable = false)