I've a dataframe with following schema -
|-- ID: string (nullable = true)
|-- VALUES: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _v1: string (nullable = true)
| | |-- _v2: string (nullable = true)
VALUES are like -
[["ABC","a"],["PQR","c"],["XYZ","b"],["TUV","d"]]
[["PQR","g"]]
[["TUV","f"],["ABC","e"]]
I've to select a single struct from this array based on the value of _v1. There is a hierarchy in these values like -
"ABC" -> "XYZ" -> "PQR" -> "TUV"
Now, if "TUV" is present, we will select the row with "TUV" in its _v1. Else we will check for "PQR". If "PQR" is present, take its row. Else check for "XYZ" and so on.
The result df should look like - (which will be StructType now, not Array[Struct])
["TUV","d"]
["PQR","g"]
["TUV","f"]
Can someone please guide me how can I approach this problem by creating a udf ? Thanks in advance.
"ABC" -> "XYZ" -> "PQR" -> "TUV"in a list?