I have a table with dates and comments.
dob | comment
---------------------------
1960-12-01 | this is useful
And I want a new column with this type:
value_type = T.StructType(
[
T.StructField("extra",T.MapType(T.StringType(), T.StringType(), True), True),
T.StructField("date", T.StringType(), True),
T.StructField("from_date", T.StringType(), True),
T.StructField("to_date", T.StringType(), True),
T.StructField("value", T.StringType(), True),
]
)
I need to:
- put the df.date into the
datefield of the struct and - put the df.comment into the
extramap of the struct
thanks to blackbishop I figured out how to do the first part here - and i tried to use .withField() to update the map but it throws an error:
I tried:
(df
.withColumn("new_col",
F.struct(*[F.lit(None).cast(f.dataType).alias(f.name)
for f in value_type.fields]))
.withColumn("new_col", (F.col("new_col")
.withField("date", F.col("dob"))
.withField("extra.value", F.col("comment")))))
But I get the following error:
AnalysisException: cannot resolve 'update_fields(update_fields(new_col, WithField(dob), WithField(dob)).extra, WithField(dob))' due to data type mismatch: struct argument should be struct type, got: map<string,string>;
I am confused as per why it would not work with the map inside the struct?
Thanks :)