1

New to Databricks. Have a SQL database table that I am creating a dataframe from. One of the columns is a JSON string. I need to explode the nested JSON into multiple columns. Have used this post and this post to get me to where I am at now.

Example JSON:

{ "Module": { "PCBA Serial Number": "G7456789", "Manufacturing Designator": "DISNEY", "Firmware Version": "0.0.0", "Hardware Revision": "46858", "Manufacturing Date": "10/17/2018 4:04:25 PM", "Test Result": "Fail", "Test Start Time": "10/22/2018 6:14:14 AM", "Test End Time": "10/22/2018 6:16:11 AM" }

Code so far:

    #define schema 
         schema = StructType(
          [
            StructField('Module',ArrayType(StructType(Seq
              StructField('PCBA Serial Number',StringType,True),
              StructField('Manufacturing Designator',StringType,True),
              StructField('Firmware Version',StringType,True),
              StructField('Hardware Revision',StringType,True),
              StructField('Test Result',StringType,True),
              StructField('Test Start Time',StringType,True),
              StructField('Test End Time',StringType,True))), True) ,True),
            StructField('Test Results',StringType(),True),
            StructField('HVM Code Errors',StringType(),True)
          ]

#use from_json to explode json by applying it to column
        df.withColumn("ActivityName", from_json("ActivityName", schema))\
            .select(col('ActivityName'))\
            .show()

Error:

    SyntaxError: invalid syntax
  File "<command-1632344621139040>", line 10
    StructField('PCBA Serial Number',StringType,True),
              ^
SyntaxError: invalid syntax

1 Answer 1

1

As you are using pyspark then types should be StringType() instead of StringType and remove Seq replace it with []

schema = StructType([StructField('Module',ArrayType(StructType([
    StructField('PCBA Serial Number',StringType(),True),
    StructField('Manufacturing Designator',StringType(),True),
    StructField('Firmware Version',StringType(),True),
    StructField('Hardware Revision',StringType(),True),
    StructField('Test Result',StringType(),True),
    StructField('Test Start Time',StringType(),True),
    StructField('Test End Time',StringType(),True)])), True),
StructField('Test Results',StringType(),True),
StructField('HVM Code Errors',StringType(),True)])
Sign up to request clarification or add additional context in comments.

2 Comments

@N.Fisher, Did the answer helped?
This did help! I confused my syntax switching between Scala and Python

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.