How to add empty map<string,string> type column to DataFrame in PySpark?

Question

I tried below code but its not working:

df=df.withColumn("cars", typedLit(Map.empty[String, String]))

Gives the error: NameError: name 'typedLit' is not defined

I just saw the tags -- pyspark does not have typedLit, but similar can be achieved using array and lit as described here — samkart
– samkart, Commented Jun 23, 2022 at 13:38

Steven · Accepted Answer · 2022-06-23 15:15:49Z

2

Create an empty column and cast it to the type you need.

from pyspark.sql import functions as F, types as T

df = df.withColumn("cars", F.lit(None).cast(T.MapType(T.StringType(), T.StringType())))

df.select("cars").printSchema()
root
 |-- cars: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

edited Jun 23, 2022 at 15:15

answered Jun 23, 2022 at 13:57

Steven

15.4k7 gold badges49 silver badges80 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mazaneicha · Accepted Answer · 2022-06-23 17:12:30Z

1

Perhaps you can use pyspark.sql.functions.expr:

>>> from pyspark.sql.functions import *
>>> df.withColumn("cars",expr("map()")).printSchema()                                                                                                       
root
 |-- col1: string (nullable = true)
 |-- cars: map (nullable = false)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = false)

EDIT:

If you'd like your map to have keys and/or values of a non-trivial type (not map<string,string> as your question's title says), some casting becomes unavoidable, I'm afraid. For example:

>>> df.withColumn("cars",create_map(lit(None).cast(IntegerType()),lit(None).cast(DoubleType()))).printSchema()                                      
root
 |-- col1: string (nullable = true)
 |-- cars: map (nullable = false)
 |    |-- key: integer
 |    |-- value: double (valueContainsNull = true)

...in addition to other options suggested by @blackbishop and @Steven. And just beware of the consequences :) -- maps can't have null keys!

edited Jun 23, 2022 at 17:12

answered Jun 23, 2022 at 14:13

mazaneicha

9,5604 gold badges38 silver badges57 bronze badges

8 Comments

Rahul Diggi Over a year ago

Thanks you, One more question, what if i want to create map<int,int> ?

Steven Over a year ago

@RahulDiggi use my solution for that !

blackbishop Over a year ago

Or expr("cast(map() as map<int,int>)")..

blackbishop Over a year ago

@mazaneicha It should :) Note the difference: cast(map() as map<int,int>) creates an empty map whereas in the other solution it creates a NULL value of type map (it's equivalent to cast(null as map<int,int>)). Also, create_map function can't be used in this particular case as you can't pass null for keys.

blackbishop Over a year ago

@mazaneicha which version of spark are you using? I can execute the same code with spark 3.2, it works just fine.

|

Collectives™ on Stack Overflow

How to add empty map<string,string> type column to DataFrame in PySpark?

2 Answers 2

Comments

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related