1

How to replace map with null if key is null in spark dataframe.

DF.printSchema

-- sku_id: string (nullable = true) 
-- sku_images: map (nullable = true)  
-- key: string  
-- value: struct (valueContainsNull = true) 
-- image_id: string (nullable = true)  
-- image_name: string (nullable = true)  
-- image_path: string (nullable = true)


11111111|Map(null -> [null,null,null])
22222222|Map(null -> [null,null,null])
33333333|Map(largeImage_1 -> [111,222,test data])

expected output :-

11111111|null
22222222|null
33333333|Map(largeImage_1 -> [111,222,loading test data])

Thanks,

1
  • 1
    Apache Spark does not support null keys for MapType objects so it just not possible to get there. If anything, you can have literal string "null" there. Please post a minimal reproducible example! Commented Apr 20, 2018 at 15:14

1 Answer 1

0

you can't have a null as key for a map, so I guess you have a "null" as key. You can set those maps to null with the following udf:

val mapSchema = DF.schema.find(_.name=="sku_images").get.dataType
val nullifyMap = udf((m: Map[String, Row]) => if (m.keySet.contains("null")) null else m, mapSchema)

val newDF = DF
  .withColumn("sku_images",  nullifyMap($"sku_images"))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.