JSON List conversion

Question

My list of (tuples of) JSON values looks as follows:

testJson = [('{"drivernumber":1, "speed" : ["30.5", "40", "50", "25.25"]}',),
            ('{"drivernumber":2, "speed" : ["25.25", "10.11", "11", "50"]}',),
            ('{"drivernumber":3, "speed" : ["40", "50", "80", "42"]}',)
           ]

I created the below data structure:

from pyspark.sql.types import StructType, StructField, IntegerType, ArrayType, StringType
readSchema = StructType([
                   StructField("drivernumber", IntegerType(), True), 
                   StructField("speed", StringType(FloatType(), True), True)])

Then created a DataFrame:

df = (spark.read.schema(readSchema).json(sc.parallelize(testJson)))
display(df)

Ultimately, I need to get the below output but at the moment, my DF (after above step) only has NULLS, and I don't know why. Any leads or tips would be much appreciated. Thank you :)

speed  drivercount
50          3
40          2
25.25       2
11          1
....        ....

Hi @JohnGordon - Just that data is passed by the broker in this format. — Bbee
– Bbee, Commented Mar 5, 2020 at 19:55

Nayanish Damania · Accepted Answer · 2020-03-05 19:43:56Z

1

You don't have to define the schema for it, Simply Use

df = spark.read.json(sc.parallelize(testJson))

answered Mar 5, 2020 at 19:43

Nayanish Damania

6526 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

JSON List conversion

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related