4

I would like to aggregate a Spark data frame using an array of column names as input, and at the same time retain the original names of the columns.

df.groupBy($"id").sum(colNames:_*)

This works but fails to preserve the names. Inspired by the answer found here I unsucessfully tried this:

df.groupBy($"id").agg(sum(colNames:_*).alias(colNames:_*))
error: no `: _*' annotation allowed here

It works to take a single element like

df.groupBy($"id").agg(sum(colNames(2)).alias(colNames(2)))

How can make this happen for the entire array?

1 Answer 1

8

Just provide an sequence of columns with aliases:

val colNames: Seq[String] = ???
val exprs = colNames.map(c => sum(c).alias(c))
df.groupBy($"id").agg(exprs.head, exprs.tail: _*)
Sign up to request clarification or add additional context in comments.

3 Comments

Yes, I guessed it would be something like this. However, still can't get it to work. error: no `: _*' annotation allowed here (such annotations are only allowed in arguments to *-parameters)
Sorry, it should be agg not sum.
Yeah, and I also figured out that colNames should infact be exprs, right?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.