I am very new to scala and I have the following issue.
I have a spark dataframe with the following schema:
df.printSchema()
root
|-- word: string (nullable = true)
|-- vector: array (nullable = true)
| |-- element: string (containsNull = true)
I need to convert this to the following schema:
root
|-- word: string (nullable = true)
|-- vector: array (nullable = true)
| |-- element: double (containsNull = true)
I do not want to specify the schema before hand, but instead change the existing one.
I have tried the following
df.withColumn("vector", col("vector").cast("array<element: double>"))
I have also tried converting it into an RDD to use map to change the elements and then turn it back into a dataframe, but I get the following data type Array[WrappedArray] and I am not sure how to handle it.
Using pyspark and numpy, I could do this by df.select("vector").rdd.map(lambda x: numpy.asarray(x)).
Any help would be greatly appreciated.