Can someone let me know where I'm going wrong with my attempt to concatenate a nested JSON field.
I'm using the following code:
df = (df
.withColumn("ingestion_date", current_timestamp())
.withColumn("name", concat(col("name.forename"),
lit(" "), col("name.surname"))))
)
Schema:
root
|-- driverRef: string (nullable = true)
|-- number: integer (nullable = true)
|-- code: string (nullable = true)
|-- forename: string (nullable = true)
|-- surname: string (nullable = true)
|-- dob: date (nullable = true)
As you can see, I'm trying to concatenate forname & surname, so as to provide a full name in the name field. At the present the data looks like the following:
After concatenating the 'name' field there should be one single value e.g. the 'name' field would just show Lewis Hamilton, and like wise for the other values in the 'name' field.
My code produces the following error:
Can't extract value from name#6976: need struct type but got string

df.printSchema()?root |-- driverRef: string (nullable = true) |-- number: integer (nullable = true) |-- code: string (nullable = true) |-- forename: string (nullable = true) |-- surname: string (nullable = true) |-- dob: date (nullable = true)namecolumn. From the error I suspect thatnamecolumn is a string and you cannot extract values from it like it's a map. Update schema to have aMapType(StringType(), StringType())and you should be able to access the values.