0

I have dataframe with 2 columns: uid: string visits: array<structtimestamp:bigint,url:string>

I need to make a new df with 3 columns: uid | timestamp (timestamp from visits:timestamp)| url (url from visits: url)

Im kinda new to the scala and spark so i dont have an idea how to map it in the right way.

For example, if i have df like this:

uid | vists

uid1 | [[timestamp1 : url1, timestamp2: url2]]

I need to make it like this:

uid | timestamp | url

uid1 | timestamp1 | url1

uid1 | timestamp2| url2

0

1 Answer 1

2

Use explode or explode_outer function to explode array columns.

Check below code.

scala> df.printSchema
root
 |-- uid: string (nullable = true)
 |-- visits: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- timestamp: long (nullable = true)
 |    |    |-- url: string (nullable = true)

scala> df
.withColumn("visits",explode_outer($"visits"))
.select($"uid",$"visits.timestamp".as("timestamp"),$"visits.url")
.show(false)

+---+---------+---+
|uid|timestamp|url|
+---+---------+---+
|uid|111      |url|
+---+---------+---+
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.