1

i have a requirement as below

dataframe

id   code
R101,GTR001
R201,RTY987
R301,KIT158
R201,PLI564
R101,MJU098
R301,OUY579

There can be many codes per id (not only two).

The expected output should be as below.

id    col1  col2   col3   col4   col5   col6

R101 GTR001 MJU098 null   null   null   null   
R201 null   null   RTY987 PLI564 null   null   
R301 null   null   null   null   KIT158 OUY579

Here columns for particular id depends on the number of codes assigned to that id, i.e. under col1 and col2 code for R101 should be populated, under col3 and col4 code for R201 should be populated and the same goes for remaining ids.

4
  • Possible duplicate of How to pivot DataFrame? Commented Jan 1, 2018 at 9:49
  • Let us continue this discussion in chat. Commented Jan 1, 2018 at 10:14
  • Can the same code, e.g. GTR001, be assigned to two or more ids? Commented Jan 1, 2018 at 14:55
  • No Jacek, code is unique Commented Jan 2, 2018 at 5:52

1 Answer 1

4

You can try ranking the code field based on id and use the rank for pivoting.Hope this helps,

 df = spark.createDataFrame([('R101','GTR001'),('R201','RTY987'),('R301','KIT158'),('R201','PLI564'),('R101','MJU098'),('R301','OUY579')],['id','code'])
 df.show()
   +----+------+
   |  id|  code|
   +----+------+
   |R101|GTR001|
   |R201|RTY987|
   |R301|KIT158|
   |R201|PLI564|
   |R101|MJU098|
   |R301|OUY579|
   +----+------+

 from pyspark.sql import functions as F
 from pyspark.sql import Window

 df = df.withColumn('rank',F.dense_rank().over(Window.orderBy("id","code")))
 df.withColumn('combcol',F.concat(F.lit('col_'),df['rank'])).groupby('id').pivot('combcol').agg(F.first('code')).show()

    +----+------+------+------+------+------+------+
   |  id| col_1| col_2| col_3| col_4| col_5| col_6|
   +----+------+------+------+------+------+------+
   |R101|GTR001|MJU098|  null|  null|  null|  null|
   |R201|  null|  null|PLI564|RTY987|  null|  null|
   |R301|  null|  null|  null|  null|KIT158|OUY579|
   +----+------+------+------+------+------+------+ 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.