1

I am working on spark dataframes and I need to do a group by of a column employee , designation and company and convert the column values of grouped rows into an array of elements as new column. Example :

Input:

employee | Company Address | designation | company | Home Adress
--------------------------------------------------
Micheal  |  NY     | Head        | xyz     | YN
Micheal  |  NJ     | Head        | xyz     | YM

Output:

employee | designation | company | Address
--------------------------------------------------
Micheal  | Head        | xyz     | [Company Address : NY , Home Adress YN], [Company Address : NJ , Home Adress : Ym]

Any help is highly appreciated.!

1 Answer 1

1

Below solution in spark for array instead of json,

from pyspark.sql.functions import *

df1 = sc.parallelize([['Micheal','NY','head','XYZ','YN'], ['Micheal','NJ','head','XYZ','YM']]).toDF(("Employee", "Company Address", "designation", "company","Home Adress"))

df2 = df1.groupBy("Employee", "designation", "company").agg(collect_list(struct(col("Company Address"),col("Home Adress"))).alias("Address"))

df2.show(1,False)

Output:

+--------+-----------+-------+--------------------+
|Employee|designation|company|Address             |
+--------+-----------+-------+--------------------+
|Micheal |head       |XYZ    |[[NY, YN], [NJ, YM]]|
+--------+-----------+-------+--------------------+
Sign up to request clarification or add additional context in comments.

8 Comments

@ Ajay_SK will it contain duplicate ?
and how will I extract value of company address for each row
Don't go with my df2.show(1, False), if you try df2.show() you will see only unique record.
If you want to get company address from data frame, select address field first, which is list of addresses. And from each address list you can extract company address which is always on first index position.
@ Ajay_SK Like I need duplicate records so basically It will contain duplicate ??
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.