I have a big table for which I m trying to calculate sums (with conditions) of some columns grouping by a location.
My code looks like this, and I have more and more columns
df.groupBy(location_column).agg(
F.sum(F.when(F.col(col1) == True, F.col(value))).alias("SUM " + col1),
F.sum(F.when(F.col(col2) == True, F.col(value))).alias("SUM " + col2),
F.sum(F.when(F.col(col3) == True, F.col(value))).alias("SUM " + col3),
....
# Additional lines for additional columns (around 20)
)
I want to refactor my code to look like less dumb, by basically doing something like
cols = [col1, col2, col3, ... , coln]
df.groupBy(location_column).agg([F.sum(F.when(F.col(x) == True, F.col(value))).alias("SUM " + x)] for x in cols)
It's not working because the agg() function does not take lists :
assert all(isinstance(c, Column) for c in exprs), "all exprs should be Column"
Is there a solution to refactor it ? Thanks