We are using below dataframe to create json file
Input file
import pandas as pd
import numpy as np
a1=["DA_STinf","DA_Stinf_NA","DA_Stinf_city","DA_Stinf_NA_ID","DA_Stinf_NA_ID_GRANT","DA_country"]
a2=["data.studentinfo","data.studentinfo.name","data.studentinfo.city","data.studentinfo.name.id","data.studentinfo.name.id.grant","data.country"]
a3=[np.NaN,np.NaN,"StringType",np.NaN,"BoolType","StringType"]
d1=pd.DataFrame(list(zip(a1,a2,a3)),columns=['data','action','datatype'])
We have to build below 2 structure using above dataframe in dynamic way we have fit above data in below format
for schema e.g::
StructType([StructField(Column_name,Datatype,True)])
for Data e.g::
F.struct(F.col(column_name)).alias(json_expected_name)
expected output structure for schema
StructType(
[
StructField("data",
StructType(
[
StructField(
"studentinfo",
StructType(
[
StructField("city",StringType(),True),
StructField("name",StructType(
[
StructField("id",
StructType(
[
StructField("grant",BoolType(),True)
])
)]
)
)
]
)
),
StructField("country",StringType(),True)
])
)
])
2)Expected data fetch
df.select(
F.struct(
F.struct(
F.struct(F.col("DA_Stinf_city")).alias("city"),
F.struct(
F.struct(F.col("DA_Stinf_NA_ID_GRANT")).alias("id")
).alias("name"),
).alias("studentinfo"),
F.struct(F.col("DA_country")).alias("country")
).alias("data")
)
We have to use for loop and add these kind of entry in (data.studentinfo.name.id) data->studentinfo->name->id Which I have already add in expected output structure