1

I have a spark dataframe in the below format where each unique id can have maximum of 3 rows which is given by rank column.

 id pred    prob      rank
485 9716    0.19205872  1
729 9767    0.19610429  1
729 9716    0.186840048 2
729 9748    0.173447074 3
818 9731    0.255104463 1
818 9748    0.215499913 2
818 9716    0.207307154 3

I want to convert (cast) into a row wise data such that each id has just one row and the pred & prob column have multiple columns differentiated by rank variable( column postfix).

id  pred_1  prob_1      pred_2  prob_2     pred_3   prob_3
485 9716    0.19205872              
729 9767    0.19610429  9716    0.186840048 9748    0.173447074
818 9731    0.255104463 9748    0.215499913 9716    0.207307154

I am not able to figure out how to o it in Pyspark

Sample code for input data creation:

# Loading the requisite packages 
from pyspark.sql.functions import col, explode, array, struct, expr, sum, lit        
# Creating the DataFrame
df = sqlContext.createDataFrame([(485,9716,19,1),(729,9767,19,1),(729,9716,18,2), (729,9748,17,3), (818,9731,25,1), (818,9748,21,2), (818,9716,20,3)],('id','pred','prob','rank'))
df.show()
1
  • 1
    possible duplicate of this question, so please have a look. Commented Oct 21, 2021 at 7:47

1 Answer 1

2

This is the pivot on multiple columns problem.Try:

import pyspark.sql.functions as F

df_pivot = df.groupBy('id').pivot('rank').agg(F.first('pred').alias('pred'), F.first('prob').alias('prob')).orderBy('id')
df_pivot.show(truncate=False)
Sign up to request clarification or add additional context in comments.

4 Comments

@ 过过招 I am getting an error as " NameError: name 'F' is not defined"
You need to import the functions with an alias F. add to the above code in the position import pyspark.sql.functions as F
Sorry, import is omitted. import pyspark.sql.types as T
The field names are separated by _, and you can reverse them.df_col_rename = df_pivot.select([F.col(c).alias('_'.join(x for x in c.split('_')[::-1])) for c in df_pivot.columns])

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.