Aggregate a Spark data frame using an array of column names, retaining the names

Question

I would like to aggregate a Spark data frame using an array of column names as input, and at the same time retain the original names of the columns.

df.groupBy($"id").sum(colNames:_*)

This works but fails to preserve the names. Inspired by the answer found here I unsucessfully tried this:

df.groupBy($"id").agg(sum(colNames:_*).alias(colNames:_*))
error: no `: _*' annotation allowed here

It works to take a single element like

df.groupBy($"id").agg(sum(colNames(2)).alias(colNames(2)))

How can make this happen for the entire array?

zero323 · Accepted Answer · 2016-09-09 11:47:19Z

8

Just provide an sequence of columns with aliases:

val colNames: Seq[String] = ???
val exprs = colNames.map(c => sum(c).alias(c))
df.groupBy($"id").agg(exprs.head, exprs.tail: _*)

edited Sep 9, 2016 at 11:47

answered Sep 8, 2016 at 12:35

zero323

331k108 gold badges982 silver badges958 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

johnblund Over a year ago

Yes, I guessed it would be something like this. However, still can't get it to work. error: no `: _*' annotation allowed here (such annotations are only allowed in arguments to *-parameters)

zero323 Over a year ago

Sorry, it should be agg not sum.

johnblund Over a year ago

Yeah, and I also figured out that colNames should infact be exprs, right?

Collectives™ on Stack Overflow

Aggregate a Spark data frame using an array of column names, retaining the names

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related