2

I am running PySpark with Spark 2.0 to aggregate data. Below is the raw Dataframe (df) as received in Spark.

DeviceID    TimeStamp           IL1    IL2    IL3    VL1    VL2    VL3
1001        2019-07-14 00:45    2.1    3.1   2.25    235    258    122
1002        2019-07-14 01:15    3.2    2.4   4.25    240    250    192
1003        2019-07-14 01:30    3.2    2.0   3.85    245    215    192
1003        2019-07-14 01:30    3.9    2.8   4.25    240    250    192

Now I want to apply groupby logic by DeviceID. There are several posts there in StackOverflow. Particularly, This and this links are of point of interest. With the help of those posts I created the following script

from pyspark.sql import functions as F
groupby = ["DeviceID"]
agg_cv = ["IL1","IL2","IL3","VL1","VL2","VL3"]
func = [min,max]
expr_cv = [F.f(F.col(c)) for f in func for c in agg_cv]
df_final = df_cv_filt.groupby(*groupby).agg(*expr_cv)

The above code is showing error as

Columns are not iterable 

Not able to understand why such error is coming. When I am using the following code

from pyspark.sql.functions import min, max, col
expr_cv = [f(col(c)) for f in func for c in agg_cv]

Then the above code is running fine.

My question is: how can I fix the above mentioned error.

3
  • Using func = [F.min,F.max] instead of func = [min,max]. Commented Jul 16, 2019 at 9:57
  • Thanks @giser_yugang. Let me check and revert. Commented Jul 16, 2019 at 10:01
  • Exactly. min, max by default are for python's default min and max. to use PySpark's min and max we can have F.min and F.max. Commented Jul 17, 2019 at 5:35

1 Answer 1

4

Try with

func = [F.min,F.max]
agg_cv = ["IL1","IL2","IL3","VL1","VL2","VL3"]
expr_cv = [f(F.col(c)) for f in func for c in agg_cv]
df_final = df1.groupby(*groupby).agg(*expr_cv)

This should work.

+--------+---------+--------+--------+--------+--------+--------+---------+--------+--------+--------+--------+--------+
|DeviceID|min( IL1)|min(IL2)|min(IL3)|min(VL1)|min(VL2)|min(VL3)|max( IL1)|max(IL2)|max(IL3)|max(VL1)|max(VL2)|max(VL3)|
+--------+---------+--------+--------+--------+--------+--------+---------+--------+--------+--------+--------+--------+
|    1003|      3.2|     2.0|    3.85|     240|     215|     192|      3.9|     2.8|    4.25|     245|     250|     192|
|    1002|      3.2|     2.4|    4.25|     240|     250|     192|      3.2|     2.4|    4.25|     240|     250|     192|
|    1001|      2.1|     3.1|    2.25|     235|     258|     122|      2.1|     3.1|    2.25|     235|     258|     122|
+--------+---------+--------+--------+--------+--------+--------+---------+--------+--------+--------+--------+--------+```
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.