So I have this streaming dataframe and I'm trying to cast this 'customer_ids' column to a simple string.
schema = StructType()\
.add("customer_ids", MapType(StringType(), StringType()))\
.add("date", TimestampType())
original_sdf = spark.readStream.option("maxFilesPerTrigger", 800)\
.load(path=source, ftormat="parquet", schema=schema)\
.select('customer_ids', 'date')
The intend to this conversion is to group by this column and agregate by max(date) like this
original_sdf.groupBy('customer_ids')\
.agg(max('date')) \
.writeStream \
.trigger(once=True) \
.format("memory") \
.queryName('query') \
.outputMode("complete") \
.start()
but I got this exception
AnalysisException: u'expression `customer_ids` cannot be used as a grouping expression because its data type map<string,string> is not an orderable data type.
How can I cast this kind of streaming DataFrame column or any other way to groupBy this column?