I need to cast the column of the data frame containing values as all string to a defined schema data types. While doing the casting we need to put the corrupt records (records which are of wrong data types) into a separate column
Example of Dataframe
+---+----------+-----+
|id |name |class|
+---+----------+-----+
|1 |abc |21 |
|2 |bca |32 |
|3 |abab | 4 |
|4 |baba |5a |
|5 |cccca | |
+---+----------+-----+
Json Schema of the file:
{"definitions":{},"$schema":"http://json-schema.org/draft-07/schema#","$id":"http://example.com/root.json","type":["object","null"],"required":["id","name","class"],"properties":{"id":{"$id":"#/properties/id","type":["integer","null"]},"name":{"$id":"#/properties/name","type":["string","null"]},"class":{"$id":"#/properties/class","type":["integer","null"]}}}
In this row 4 is corrupt records as the class column is of type Integer So only this records has to be there in corrupt records, not the 5th row