0

enter image description hereI want to split the filteredaddress column of the spark dataframe above into two new columns that are Flag and Address:

customer_id|pincode|filteredaddress|                                                              Flag| Address
1000045801 |121005 |[{'flag':'0', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]
1000045801 |121005 |[{'flag':'1', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]
1000045801 |121005 |[{'flag':'1', 'address':'House number 172, Parvatiya Colony Part-2 , N.I.T'}]

Can anyone please tell me how can I do it?

1 Answer 1

1

You can get the values from filteredaddress map column using the keys:

df2 = df.selectExpr(
    'customer_id', 'pincode',
    "filteredaddress['flag'] as flag", "filteredaddress['address'] as address"
)

Other ways to access map values are:

import pyspark.sql.functions as F

df.select(
    'customer_id', 'pincode',
    F.col('filteredaddress')['flag'],
    F.col('filteredaddress')['address']
)

# or, more simply

df.select(
    'customer_id', 'pincode',
    'filteredaddress.flag',
    'filteredaddress.address'
)
Sign up to request clarification or add additional context in comments.

3 Comments

the above code is throwing error: cannot resolve 'from_json(filteredaddress)' due to data type mismatch: argument 1 requires string type, however, 'filteredaddress' is of map<string,string> type.
@peeps the dataframe you showed in your question does not have a map type column. Could you do df.show() and copy the output to your question?
df.show() added the output to my question

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.