How to convert multiple rows of a Dataframe into a single row in Scala (Using Dataframe APIs) without using a SQL? [duplicate]

Question

I have a dataframe nameDF as below:

scala> val nameDF = Seq(("John","A"), ("John","B"), ("John","C"), ("John","D"), ("Bravo","E"), ("Bravo","F"), ("Bravo","G")).toDF("Name","Init")
nameDF: org.apache.spark.sql.DataFrame = [Name: string, Init: string]

scala> nameDF.show
+------+----+
|Name  |Init|
+------+----+
|Johnny|   A|
|Johnny|   B|
|Johnny|   C|
|Johnny|   D|
|Bravo |   E|
|Bravo |   F|
|Bravo |   G|
+------+----+

Without using SQL, I am trying to group the names and convert the multiple rows of each "Name" into a single row as given below:

+------+-------+
|Name  |Init   |
+------+-------+
|Johnny|A,B,C,D|
|Bravo |E,F,G  |
+------+-------+

I see the available options to pivot are not suitable for String operations.

Is Pivot the correct option in this case ? If not, could anyone let me know how can I achieve the solution ?

Does this answer your question? Spark SQL replacement for MySQL's GROUP_CONCAT aggregate function — user10938362
– user10938362, Commented Jan 23, 2020 at 17:54

Raphael Roth · Accepted Answer · 2020-01-23 18:49:56Z

2

Try this:

import org.apache.spark.sql.functions._

df.groupBy($"Name")
 .agg(concat_ws(",", sort_array(collect_list($"Init"))).as("Init"))

answered Jan 23, 2020 at 18:49

Raphael Roth

27.3k19 gold badges98 silver badges152 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Metadata Over a year ago

Raphael, Your solution works but I have a doubt here. AGG is an aggregate function. How is it working on a column that has String content ?

Collectives™ on Stack Overflow

How to convert multiple rows of a Dataframe into a single row in Scala (Using Dataframe APIs) without using a SQL? [duplicate]

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related