1

I'm learning Pyspark, I used it to read a csv file into a dataframe:

>>> df = spark.read.option("header",True).csv('example.csv')
>>> df.show(n=4)
+-------+------+------+
|main_id|    id| price|
+-------+------+------+
|    100|aaaaa1|190000|
|    101| bbbbb|216000|
|    100|aaaaa2|276000|
|    100|aaaaa3|340000|
+-------+------+------+
only showing top 4 rows

How can I group data based on the first column main_id and group the id and price in an array, then convert the dataframe to a newline delimited json format? Something like:

{"main_id": "100", "items": [{"id": "aaaaa1", "price": 190000},{"id": "aaaaa2", "price": 276000},{"id": "aaaaa3", "price": 340000}]}
{"main_id": "101", "id": "bbbbb", "price": 216000}
...
...

1 Answer 1

1

You can use struct and groupby with collect_list as

from pyspark.sql import functions as f

df1.select(f.col("main_id"), f.struct(f.col("id"), f.struct("price")).alias("items"))\
    .groupby("main_id")\
    .agg(f.collect_list("items").alias("items"))

Or:

from pyspark.sql import functions as f
df1.groupby("main_id") \
    .agg(f.collect_list("id").alias("id"), f.collect_list("price").alias("price"))\
    .select("main_id", f.arrays_zip(f.col("id"), f.col("price")).alias("items"))

Output:

{"main_id":100,"items":[{"id":"aaaaa1","col2":{"price":190000}},{"id":"aaaaa2","col2":{"price":276000}},{"id":"aaaaa3","col2":{"price":340000}}]}
{"main_id":101,"items":[{"id":"bbbbb","col2":{"price":216000}}]}
Sign up to request clarification or add additional context in comments.

5 Comments

Hi thanks, just wondering what does f mean in the code? I didn't see the definition of f..
Ahh do you mean from pyspark.sql import functions as f? But the last line .select("main_id", f.arrays_zip(f.col("id"), f.col("price")).alias("items")) threw an error saying AttributeError: module 'pyspark.sql.functions' has no attribute 'arrays_zip'.
Yes exactly it is from from pyspark.sql import functions as f
Make sure you have Spark Version is 2.4.0+ to be able to use arrays_zip
Aww right I'm using 2.3, makes sense, thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.