0

I have SparseVectors generated from IDF transformation that look like:

user='1234', idf=SparseVector(174, {0: 0.4709, 5: 0.8967, 7: 0.9625, 8: 0.9814,...})

I would like to explode this into something like:

|index|rating|user|
|0    |0.4709|1234|
|5    |0.8967|1234|
|7    |0.9625|1234|
|8    |0.9814|1234|
.
.
.

My objective is to take these index,value tuples and perform an ALS step.

0

1 Answer 1

4

This task will require an UserDefinedFunction:

from pyspark.sql.functions import udf, explode
from pyspark.ml.linalg import SparseVector, DenseVector

df = spark.createDataFrame([
    ('1234', SparseVector(174, {0: 0.4709, 5: 0.8967, 7: 0.9625, 8: 0.9814}))
]).toDF("user", "idf")

@udf("map<long, double>")
def vector_as_map(v):
   if isinstance(v, SparseVector):
       return dict(zip(v.indices.tolist(), v.values.tolist()))
   elif isinstance(v, DenseVector):
      return dict(zip(range(len(v)), v.values.tolist()))

df.select("user", explode(vector_as_map("idf")).alias("index", "rating")).show()

which would give you and expected result:

+----+-----+------+                                                             
|user|index|rating|
+----+-----+------+
|1234|    0|0.4709|
|1234|    8|0.9814|
|1234|    5|0.8967|
|1234|    7|0.9625|
+----+-----+------+
Sign up to request clarification or add additional context in comments.

2 Comments

Yes, so what seems to be working for me is this: @udf(returnType=MapType(LongType(), DoubleType())) Any thoughts why that would work but the string doesn't?
Not a clue. I've tested this with both 2.3 and 2.4 and couldn't reproduce the problem. Furthermore SPARK-19427 has been resolved in 2.2, so it should work with all versions, where decorator with type works. Sounds a bit like there is no SparkContext in the scope.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.