1

I there an easy way to call sql on multiple column on spark sql.

For example, Let's say I have a query that should be applied to most columns

select
min(c1) as min,
max(c1) as max,
max(c1) - min(c1) range
from table tb1

If there are multiple columns, is there a way to execute the query for all the columns, and get result one time.

Similar to how df.describe does.

1 Answer 1

1

Use the meta data (columns in this case) included in your dataframe (which you can get via spark.table("<table_name>") if you don't have it in scope already to get the column names, then apply the functions you want and pass to df.select (or df.selectExpr).

Build some test data:

scala> var seq = Seq[(Int, Int, Float)]()
seq: Seq[(Int, Int, Float)] = List()

scala> (1 to 1000).foreach(n => { seq = seq :+ (n,r.nextInt,r.nextFloat) })

scala> val df = seq.toDF("id", "some_int", "some_float")

Denote some functions we want to run on all the columns:

scala> val functions_to_apply = Seq("min", "max")
functions_to_apply: Seq[String] = List(min, max)

Setup the final Seq of SQL Columns:

scala> var select_columns = Seq[org.apache.spark.sql.Column]()
select_columns: Seq[org.apache.spark.sql.Column] = List()

Iterate over the columns and functions to apply to populate the select_columns Seq:

scala> val cols = df.columns

scala> cols.foreach(col => { functions_to_apply.foreach(f => {select_columns = select_columns :+ expr(s"$f($col)")})})

Run the actual query:

scala> df.select(select_columns:_*).show

+-------+-------+-------------+-------------+---------------+---------------+
|min(id)|max(id)|min(some_int)|max(some_int)|min(some_float)|max(some_float)|
+-------+-------+-------------+-------------+---------------+---------------+
|      1|   1000|  -2143898568|   2147289642|   1.8781424E-4|     0.99964607|
+-------+-------+-------------+-------------+---------------+---------------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.