I've the following DF schema:
scala> hotelsDF.printSchema()
root
|-- id: long (nullable = true)
|-- version: integer (nullable = true)
|-- timestamp: long (nullable = true)
|-- changeset: long (nullable = true)
|-- uid: integer (nullable = true)
|-- user_sid: binary (nullable = true)
|-- tags: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- key: binary (nullable = true)
| | |-- value: binary (nullable = true)
|-- latitude: double (nullable = true)
|-- longitude: double (nullable = true)
I need to filter records which have key equal to tourism and value equal to hotel. I do it with the following SQL query:
sqlContext.sql("select * from nodes where array_contains(tags.key, binary('tourism')) and array_contains(tags.value, binary('hotel'))").show()
So far, so good.
Now, my question is how can I select the value for a given tag key? Pseudoquery will be something like:
sqlContext.sql("select tags.tourism from nodes where array_contains(tags.key, binary('tourism')) and array_contains(tags.value, binary('hotel'))").show()
and return hotel for all entries.