How to split values from map_keys() into multiple columns in PySpark

Question

I have this data frame that has a schema with a map like below:

root
 |-- events: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

When I explode it or use map_keys() to obtain those values I get this dataframe below:

+--------------------+--------------------+
|            map_data|          map_values|
+--------------------+--------------------+
|[[{event_name=walk..|[{event_name=walk...|
|[[{event_name=walk..|          2019-02-17|
|[[{event_name=walk..|            08:00:00|
|[[{event_name=run...|[{event_name=walk...|
|[[{event_name=fly...|          2019-02-17|
|[[{event_name=run...|            09:00:00|
+--------------------+--------------------+

This is my code to get to the dataframe show above:

events = event_data\
   .withColumn(
      "map_data", 
      F.map_values(event_data.events)
   )
events.printSchema()
events.select("map_data")
   .withColumn(
      "map_values", 
      F.explode(events.map_data)
   ).show(10)

From what I started with, I would consider this a milestone reached, however, I would like my data frame to look like this:

+--------------------+-----------+--------+
|          events    |     date  |   time |
+--------------------+-----------+--------+
|[{event_name=walk...| 2019-02-17|08:00:00|
|[{event_name=walk...| 2019-02-17|09:00:00|
+--------------------+-----------+--------+

I have been researching and I have seen that people are utilizing udf's, however, I am sure there is a way to accomplish what I want purely with dataframes and sql functions.

For more insight here is how my rows look like when without .show(truncate=False)

+--------------------+--------------------+
|            map_data|          map_values|
+--------------------+--------------------+
|[[{event_name=walk..|[{event_name=walk, duration=0.47, x=0.39, y=0.14, timestamp=08:02:30.574892}, {event_name=walk, duration=0.77, x=0.15, y=0.08, timestamp=08:02:50.330245}, {event_name=run, duration=0.02, x=0.54, y=0.44, timestamp=08:02:22.737803}, {event_name=run, duration=0.01, x=0.43, y=0.56, timestamp=08:02:11.629404}, {event_name=run, duration=0.03, x=0.57, y=0.4, timestamp=08:02:22.660778}, {event_name=run, duration=0.02, x=0.49, y=0.49, timestamp=08:02:56.660186}]|
|[[{event_name=walk..|          2019-02-17|
|[[{event_name=walk..|            08:00:00|

Also, with the dataframe I have now, my issue here is to find out how to explode an array into multiple columns. I mention this cause I can either work with that or perform a more efficient process to create the dataframe based on the map I was given.

could u provide a complete view of the first events column using .show(truncate=False) — murtihash
– murtihash, Commented Apr 29, 2020 at 23:01

frlzjosh · Accepted Answer · 2020-04-29 23:41:16Z

0

I have found a solution to my problem. I needed to go about this approach (Create a dataframe from a hashmap with keys as column names and values as rows in Spark) and perform these series of computation on event_data which is my initialized dataframe.

This is how my dataframe looks now

|25769803776|2019-03-19|[{event_name=walk, duration=0.47, x=0.39, y=0.14, timestamp=08:02:30.574892}, {event_name=walk, duration=0.77, x=0.15, y=0.08, timestamp=08:02:50.330245}, {event_name=run, duration=0.02, x=0.54, y=0.44, timestamp=08:02:22.737803}, {event_name=run, duration=0.01, x=0.43, y=0.56, timestamp=08:02:11.629404}, {event_name=run, duration=0.03, x=0.57, y=0.4, timestamp=08:02:22.660778}, {event_name=run, duration=0.02, x=0.49, y=0.49, timestamp=08:02:56.660186}]|08:02:00|

edited Apr 29, 2020 at 23:41

answered Apr 29, 2020 at 23:23

frlzjosh

4646 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to split values from map_keys() into multiple columns in PySpark

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related