0

I'm working in a Python 3 notebook in Azure Databricks with Spark 3.0.1.

I have the following DataFrame

+---+---------+
|ID |Name     |
+---+---------+
|1  |John     |
|2  |Michael  |
+---+---------+

Which can be created with this code

from pyspark.sql.types import StructType,StructField, StringType, IntegerType

data2 = [(1,"John","Doe"),
    (2,"Michael","Douglas")
  ]

schema = StructType([ \
    StructField("ID",IntegerType(),True), \
    StructField("Name",StringType(),True), \
  ])
 
df1 = spark.createDataFrame(data=data2,schema=schema)
df1.show(truncate=False)

I am trying to transform it into an object which can be serialized into json with a single property called Entities which is an array of the elements in the DataFrame.

Like this

{
    "Entities": [
        {
            "ID": 1,
            "Name": "John"
        },
        {
            "ID": 2,
            "Name": "Michael"
        }
    ]
}

I've been trying to figure out how to do it but haven't had any luck so far. Can anyone point me in the right direction please?

1 Answer 1

1

try this:

from pyspark.sql.types import StructType,StructField, StringType, IntegerType
from pyspark.sql import functions as F

data2 = [
    (1,"John","Doe"),
    (2,"Michael","Douglas")
]
schema = StructType([ 
    StructField("id",IntegerType(),True), 
    StructField("fname",StringType(),True), 
    StructField("lname",StringType(),True), 
  ])
df1 = spark.createDataFrame(data2, schema)

df = (
    df1
    .withColumn("profile", F.struct("id", "fname"))
    .groupby()
    .agg(F.collect_list("profile").alias("Entities"))  
)
df.select("Entities").coalesce(1).write.format('json').save('test', mode="overwrite")

Output file:

{
    "Entities": [{
        "id": 1,
        "fname": "John"
    }, {
        "id": 2,
        "fname": "Michael"
    }]
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.