2

How to pass variable arguments to the Cube function in spark sql and also agg function of the cube?

I have a list of columns, and I want to find the cube function on the columns and also aggerations function.

For example:

val columnsInsideCube = List("data", "product","country")
val aggColumns = List("revenue")

I want something like this:

dataFrame.cube(columns:String*).agg(aggcolumns:String*)

This is not like passing scala array to the Cube. Cube is predefined class in the datafram.we have to send it in a proper manner.

2
  • 1
    I formatted your text and fixed your grammar, because I love you. Next time do it from yourself, thank you. And don't forget: "I" is always capital case on English! Commented Jun 14, 2016 at 15:12
  • 2
    Possible duplicate of How pass scala Array into scala vararg method? Commented Jun 15, 2016 at 1:30

1 Answer 1

0

You could use

Spark (new in version 1.4)

import pyspark.sql.DataFrame.cube
df.cube("name", df.age).count().orderBy("name", "age").show()

see also How to use "cube" only for specific fields on Spark dataframe?


or HiveSQL

GROUP BY a, b, c WITH CUBE

or which is equivalent to

GROUP BY a, b, c GROUPING SETS ( (a, b, c), (a, b), (b, c), (a, c), (a), (b), (c), ( ))

https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup#space-menu-link-content


or you could use other libraries like

import com.activeviam.sparkube._
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.