Trying to do the following scala code but in pyspark:
val maxJsonParts = 3 // whatever that number is...
val jsonElements = (0 until maxJsonParts)
.map(i => get_json_object($"Payment", s"$$[$i]"))
val newDF = dataframe
.withColumn("Payment", explode(array(jsonElements: _*)))
.where(!isnull($"Payment"))
For example, I am trying to make a nested column such as in the payment column below:
| id | name | payment |
|---|---|---|
| 1 | James | [ {"@id": 1, "currency":"GBP"},{"@id": 2, "currency": "USD"} ] |
to become:
| id | name | payment |
|---|---|---|
| 1 | James | {"@id": 1, "currency":"GBP"} |
| 1 | James | {"@id":2, "currency":"USD"} |
The table schema:
root
|-- id: integer (nullable = true)
|-- Name: string (nullable = true)
|-- Payment: string (nullable = true)
I tried writing this in Pyspark but its just turning the nested column (payment) to null:
lst = [range(0,10)]
jsonElem = [F.get_json_object(F.col("payment"), f"$[{i}]") for i in lst]
bronzeDF = bronzeDF.withColumn("payment2", F.explode(F.array(*jsonElem)))
bronzeDF.show()
Any help is highly appreciated.