I have the following schema:
>>> df.printSchema()
root
... SNIP ...
|-- foo: array (nullable = true)
| |-- element: struct (containsNull = true)
... SNIP ...
| | |-- value: double (nullable = true)
| | |-- value2: double (nullable = true)
In this case, I only have one row in the dataframe and in the foo array:
>>> df.count()
1
>>> df.select(explode('foo').alias("fooColumn")).count()
1
value is null:
>>> df.select(explode('foo').alias("fooColumn")).select('fooColumn.value','fooColumn.value2').show()
+-----+------+
|value|value2|
+-----+------+
| null| null|
+-----+------+
I want to edit value and make a new dataframe. I can explode foo and set value:
>>> fooUpdated = df.select(explode("foo").alias("fooColumn")).select("fooColumn.*").withColumn('value', lit(10)).select('value').show()
+-----+
|value|
+-----+
| 10|
+-----+
How do I collapse this dataframe to put fooUpdated back in as an array with a struct element or is there a way to do this without exploding foo?
In the end, I want to have the following:
>>> dfUpdated.select(explode('foo').alias("fooColumn")).select('fooColumn.value', 'fooColumn.value2').show()
+-----+------+
|value|value2|
+-----+------+
| 10| null|
+-----+------+