1

I have a Spark dataframe where one of the columns (called features) is a struct type, specifically:

struct<type:tinyint,size:int,indices:array<int>,values:array<double>>

When I do df.printSchema(), this is what I get:

root
 |-- features: vector (nullable = true)

What I would like to do, is to have the values of the above struct in a separate column.

I have tried:

df.select("features.values").show()

But then I get the error:

AnalysisException: Can't extract value from features#125369: need struct type but got struct<type:tinyint,size:int,indices:array<int>,values:array<double>>;

Which I don't understand, especially the part where it says need struct type but got struct (??). Can someone help me with this?

2 Answers 2

1

you may need to convert the vector to array first:

from pyspark.ml.functions import vector_to_array

df2 = df.select(vector_to_array("features").alias("features"))

and then select the appropriate columns.

Sign up to request clarification or add additional context in comments.

2 Comments

Hi @mck, thanks for your answer. I tried this function, but I seem to be getting a different array than the one in the values array of the features column. Here's an example of 1 row: the values array of features contains this: [1, 0.5, 1, 1, 1, 1, 1, 1, 0.5, 1] but the "new" features (after applying vector_to_array) contains: [1, 0, 0.5, 0, 0, 0, 0, 0, 0, 0]. Do you know why this is not the same array? I can't seem to find any good documentation on this function.
@vdvaxel that’s strange, but I can’t say anything without seeing some code
0

To complete @mck answer, if you use Pyspark < 3, you need to use an UDF that converts the vector to a list and then applies the type (array of floats):

import pyspark.sql.functions as F
import pyspark.sql.types as T

to_array_udf = F.udf(lambda vector: vector.toArray().tolist(), T.ArrayType(T.FloatType()))

df2 = df.withColumn("features", to_array_udf("features"))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.