I currently have code in which I repeatedly apply the same procedure to multiple DataFrame Columns via multiple chains of .withColumn, and am wanting to create a function to streamline the procedure. In my case, I am finding cumulative sums over columns aggregated by keys:
val newDF = oldDF
.withColumn("cumA", sum("A").over(Window.partitionBy("ID").orderBy("time")))
.withColumn("cumB", sum("B").over(Window.partitionBy("ID").orderBy("time")))
.withColumn("cumC", sum("C").over(Window.partitionBy("ID").orderBy("time")))
//.withColumn(...)
What I would like is either something like:
def createCumulativeColums(cols: Array[String], df: DataFrame): DataFrame = {
// Implement the above cumulative sums, partitioning, and ordering
}
or better yet:
def withColumns(cols: Array[String], df: DataFrame, f: function): DataFrame = {
// Implement a udf/arbitrary function on all the specified columns
}