1

Following up this question and dataframes, I am trying to convert a dataframe into a dictionary. In pandas I was using this:

dictionary = df_2.unstack().to_dict(orient='index')

However, I need to convert this code to pyspark. Can anyone help me with this? As I understand from previous questions such as this I would indeed need to use pandas, but the dataframe is way too big for me to be able to do this. How can I solve this?

EDIT:

I have now tried the following approach:

dictionary_list = map(lambda row: row.asDict(), df_2.collect())
dictionary  = {age['age']: age for age in dictionary_list}

(reference) but it is not yielding what it is supposed to.

In pandas, what I was obtaining was the following:

enter image description here

3
  • what's your expected output? Commented Jan 14, 2021 at 11:40
  • @mck I added a printscreen in the question Commented Jan 14, 2021 at 11:50
  • @mck the original code I had in pandas for the whole process was this: dictionary = (value/value.groupby(level=0).sum()).unstack().to_dict(orient='index'), refering to the dataframe in this question: stackoverflow.com/questions/65707148/… Commented Jan 14, 2021 at 11:50

1 Answer 1

2

df2 is the dataframe from the previous post. You can do a pivot first, and then convert to dictionary as described in your linked post.

import pyspark.sql.functions as F

df3 = df2.groupBy('age').pivot('siblings').agg(F.first('count'))
list_persons = [row.asDict() for row in df3.collect()]
dict_persons = {person['age']: person for person in list_persons}

{15: {'age': 15, '0': 1.0, '1': None, '3': None}, 10: {'age': 10, '0': None, '1': None, '3': 1.0}, 14: {'age': 14, '0': None, '1': 1.0, '3': None}}

Or another way:

df4 = df3.fillna(float('nan')).groupBy().pivot('age').agg(F.first(F.struct(*df3.columns[1:])))
result_dict = eval(df4.select(F.to_json(F.struct(*df4.columns))).head()[0])

{'10': {'0': 'NaN', '1': 'NaN', '3': 1.0}, '14': {'0': 'NaN', '1': 1.0, '3': 'NaN'}, '15': {'0': 1.0, '1': 'NaN', '3': 'NaN'}}
Sign up to request clarification or add additional context in comments.

3 Comments

Unfortunately it's not working :( "TypeError: 'map' object is not callable"
I am using the edited version, unfortunately the error it's still there for me :(
@Johanna I removed that annoying function, could you please try again?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.