PySpark: Attach dictionary data to PySpark DataFrame

Question

A simplified version of my problem is this:

I have a Spark DataFrame ("my_df") with one column ("col1") and values 'a','b','c','d'

and a dictionary ("my_dict") like this: {'a':5, 'b':7', 'c':2, 'd':4}

I would like to combine these to create a DataFrame with an additional column containing the corresponding values from my_dict.

At the moment I am using the following method, which works for a small dataset, but it's very inefficient, and it causes a StackOverflowError on my full dataset

import pyspark.sql.functions as F

# start with an arbitrary df containing "col1"
# initialise new column with zeros
my_df = my_df.withColumn('dict_data', F.lit(0))

for k,v in my_dict.items():
    my_df = my_df.withColumn('dict_data',
                             F.when((my_df['col1']==k),
                                     v).otherwise(df['dict_data'])
                             )

Is there a better way to do this? I've tried using Window functions but I've had difficult applying it in this context...

pissall · Accepted Answer · 2018-06-19 11:01:54Z

1

You just need to map your dictionary values into a new column based on the values of your first column. You can refer to :

pyspark create new column with mapping from a dict

answered Jun 19, 2018 at 11:01

pissall

7,4442 gold badges29 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Pierre Gourseaud · Accepted Answer · 2018-06-19 11:20:11Z

1

You can do it with an intermediate dataframe and a join:

rows = [{'col1': key, 'dict_data': value} for key,value in my_dict.items()]
my_dict_df = rdd.parallelize(rows).toDF()

result_df = my_df.join(my_dict_df, 'col1', 'left')

answered Jun 19, 2018 at 11:20

Pierre Gourseaud

2,49716 silver badges24 bronze badges

Collectives™ on Stack Overflow

PySpark: Attach dictionary data to PySpark DataFrame

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related