My list of (tuples of) JSON values looks as follows:
testJson = [('{"drivernumber":1, "speed" : ["30.5", "40", "50", "25.25"]}',),
('{"drivernumber":2, "speed" : ["25.25", "10.11", "11", "50"]}',),
('{"drivernumber":3, "speed" : ["40", "50", "80", "42"]}',)
]
I created the below data structure:
from pyspark.sql.types import StructType, StructField, IntegerType, ArrayType, StringType
readSchema = StructType([
StructField("drivernumber", IntegerType(), True),
StructField("speed", StringType(FloatType(), True), True)])
Then created a DataFrame:
df = (spark.read.schema(readSchema).json(sc.parallelize(testJson)))
display(df)
Ultimately, I need to get the below output but at the moment, my DF (after above step) only has NULLS, and I don't know why. Any leads or tips would be much appreciated. Thank you :)
speed drivercount
50 3
40 2
25.25 2
11 1
.... ....