Window function is not working on Pyspark sqlcontext

Question

I have a data frame and I want to roll up the data into 7days and do some aggregation on some of the function.

I have a pyspark sql dataframe like ------

Sale_Date|P_1|P_2|P_3|G_1|G_2|G_3|Total_Sale|Sale_Amt|Promo_Disc_Amt  |

|2013-04-10| 1| 9| 1| 1| 1| 1| 1| 295.0|0.0|
|2013-04-11| 1| 9| 1| 1| 1| 1| 3| 567.0|0.0| 
|2013-04-12| 1| 9| 1| 1| 1| 1| 2| 500.0|200.0|   
|2013-04-13| 1| 9| 1| 1| 1| 1| 1| 245.0|20.0| 
|2013-04-14| 1| 9| 1| 1| 1| 1| 1| 245.0|0.0|
|2013-04-15| 1| 9| 1| 1| 1| 1| 2| 500.0|200.0|  
|2013-04-16| 1| 9| 1| 1| 1| 1| 1| 250.0|0.0|

I have applied a window function over the data frame as follows -

days = lambda i: i * 86400
windowSp = Window().partitionBy(dataframeOfquery3["P_1"],dataframeOfquery3["P_2"],dataframeOfquery3["P_3"],dataframeOfquery3["G_1"],dataframeOfquery3["G_2"],dataframeOfquery3["G_3"])\
          .orderBy(dataframeOfquery3["Sale_Date"].cast("timestamp").cast("long").desc())\
          .rangeBetween(-(days(7)), 0)

Now I want to perform some aggregation i.e. applying some windows functions like the following --

df = dataframeOfquery3.select(min(dataframeOfquery3["Sale_Date"].over(windowSp).alias("Sale_Date")))
df.show()

But it is giving following error.

py4j.protocol.Py4JJavaError: An error occurred while calling o138.select.
: org.apache.spark.sql.AnalysisException: Could not resolve window function 'min'. Note that, using window functions currently requires a HiveContext;

I am using Apache Spark 1.6.0 Pre-built on Hadoop.

Community · Accepted Answer · 2017-05-23 10:27:29Z

3

The error kind of says everything :

py4j.protocol.Py4JJavaError: An error occurred while calling o138.select.
: org.apache.spark.sql.AnalysisException: Could not resolve window function 'min'. Note that, using window functions currently requires a HiveContext;

You'll need a version of spark that supports hive (build with hive) than you can declare a hivecontext :

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

and then use that context to perform your window function.

In python :

# sc is an existing SparkContext.
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)

You can read further about the difference between SQLContextand HiveContext here.

SparkSQL has a SQLContext and a HiveContext. HiveContext is a super set of the SQLContext. The Spark community suggest using the HiveContext. You can see that when you run spark-shell, which is your interactive driver application, it automatically creates a SparkContext defined as sc and a HiveContext defined as sqlContext. The HiveContext allows you to execute SQL queries as well as Hive commands. The same behavior occurs for pyspark.

edited May 23, 2017 at 10:27

CommunityBot

11 silver badge

answered Mar 15, 2016 at 12:54

eliasah

40.5k12 gold badges128 silver badges159 bronze badges

Sign up to request clarification or add additional context in comments.

22 Comments

Sayak Ghosh Over a year ago

Yes. I have seen the error. But I have followed the following threads. thread 1 , thread 2 and Databricks thread.. Into all of the above, window function is properly working with pyspark sqlcontext. @eliasah

eliasah Over a year ago

It's kind of tricky in some kind of environments. I know all of those threads. Those threads don't present the hivecontext but it's actually needed and also they don't even talk about cluster configurations. I have presented you the way I do it.

Sayak Ghosh Over a year ago

Is there any way to use pyspark.sql.window into pyspark sql context without HiveContext? or How can I manage this type of situation on pyspark.sql.sqlcontext? Please suggest @eliasah

zero323 Over a year ago

Default pre-built binaries from downloads work as well.

Sayak Ghosh Over a year ago

Thanks a lot. Now hive context is properly working for me. @eliasah

|

Collectives™ on Stack Overflow

Window function is not working on Pyspark sqlcontext

1 Answer 1

22 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

22 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related