2

After running ALS algorithm in pyspark over a dataset, I have come across a final dataframe which looks like the following

enter image description here

Recommendation column is array type, now I want to split this column, my final dataframe should look like this

enter image description here

Can anyone suggest me, which pyspark function can be used to form this dataframe?

Schema of the dataframe

root
 |-- person: string (nullable = false)
 |-- recommendation: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- ID: string (nullable = true)
 |    |    |-- rating: float (nullable = true)
2
  • Can you share the schema of your data frame with df.printSchema() ? Commented Jul 11, 2021 at 7:46
  • @Psidom, I have added it in the question after your comment, please have a look Commented Jul 11, 2021 at 7:53

1 Answer 1

3

Assuming ID doesn't duplicate in each array, you can try the following:

import pyspark.sql.functions as f

df.withColumn('recommendation', f.explode('recommendation'))\
    .withColumn('ID', f.col('recommendation').getItem('ID'))\
    .withColumn('rating', f.col('recommendation').getItem('rating'))\
    .groupby('person')\
    .pivot('ID')\
    .agg(f.first('rating')).show()

+------+---+---+---+
|person|  a|  b|  c|
+------+---+---+---+
|   xyz|0.4|0.3|0.3|
|   abc|0.5|0.3|0.2|
|   def|0.3|0.2|0.5|
+------+---+---+---+

Or transform with RDD:

df.rdd.map(lambda r: Row(
    person=r.person, **{s.ID: s.rating for s in r.recommendation})
).toDF().show()

+------+-------------------+-------------------+-------------------+
|person|                  a|                  b|                  c|
+------+-------------------+-------------------+-------------------+
|   abc|                0.5|0.30000001192092896|0.20000000298023224|
|   def|0.30000001192092896|0.20000000298023224|                0.5|
|   xyz| 0.4000000059604645|0.30000001192092896|0.30000001192092896|
+------+-------------------+-------------------+-------------------+
Sign up to request clarification or add additional context in comments.

1 Comment

Spot on @Psidom, Thanks a lot.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.