I have json files with a complex schema (see below) that I am reading using Spark. I found out that some of the fields are duplicated in the source data thus Spark throws an error during reading (as expected). The duplicate names are under the storageidlist field. What I would like to do is to load the storageidlist field as an unparsed string into a string type column and parse it manually afterwards. Would this be possible in Spark?
root
|-- errorcode: string (nullable = true)
|-- errormessage: string (nullable = true)
|-- ip: string (nullable = true)
|-- label: string (nullable = true)
|-- status: string (nullable = true)
|-- storageidlist: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- errorcode: string (nullable = true)
| | |-- errormessage: string (nullable = true)
| | |-- fedirectorList: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- directorId: string (nullable = true)
| | | | |-- errorcode: string (nullable = true)
| | | | |-- errordesc: string (nullable = true)
| | | | |-- metrics: string (nullable = true)
| | | | |-- portMetricDataList: array (nullable = true)
| | | | | |-- element: array (containsNull = true)
| | | | | | |-- element: struct (containsNull = true)
| | | | | | | |-- data: array (nullable = true)
| | | | | | | | |-- element: struct (containsNull = true)
| | | | | | | | | |-- ts: string (nullable = true)
| | | | | | | | | |-- value: string (nullable = true)
| | | | | | | |-- errorcode: string (nullable = true)
| | | | | | | |-- errordesc: string (nullable = true)
| | | | | | | |-- metricid: string (nullable = true)
| | | | | | | |-- portid: string (nullable = true)
| | | | | | | |-- status: string (nullable = true)
| | | | |-- status: string (nullable = true)
| | |-- metrics: string (nullable = true)
| | |-- status: string (nullable = true)
| | |-- storageGroupList: string (nullable = true)
| | |-- storageid: string (nullable = true)
|-- sublabel: string (nullable = true)
|-- ts: string (nullable = true)