Suppose I have a dataframe with multiple columns, I want to iterate each column, do some calculation and update that column. Is there any good way to do that?
2 Answers
@rogue-one has already answered your queries, you just need to modify the answer to meet your requirements.
Following is the solution by not using Window function.
val df = List(
(2, 28),
(1, 21),
(7, 42)
).toDF("col1", "col2")
Your input dataframe should look like
+----+----+
|col1|col2|
+----+----+
|2 |28 |
|1 |21 |
|7 |42 |
+----+----+
Now to apply columnValue/sumOfColumnValues do as
val columnsModify = df.columns.map(col).map(colName => {
val total = df.select(sum(colName)).first().get(0)
colName/total as(s"${colName}")
})
df.select(columnsModify: _*).show(false)
You should get ouput as
+----+-------------------+
|col1|col2 |
+----+-------------------+
|0.2 |0.3076923076923077 |
|0.1 |0.23076923076923078|
|0.7 |0.46153846153846156|
+----+-------------------+
Update In below example I have a dataframe with two integer columns c1 and c2. each column's value is divided with the sum of its columns.
import org.apache.spark.sql.expressions.Window
val df = Seq((1,15), (2,20), (3,30)).toDF("c1","c2")
val result = df.columns.foldLeft(df)((acc, colname) => acc.withColumn(colname, sum(acc(colname)).over(Window.orderBy(lit(1)))/acc(colname)))
Output:
scala> result.show()
+---+------------------+
| c1| c2|
+---+------------------+
|6.0| 4.333333333333333|
|3.0| 3.25|
|2.0|2.1666666666666665|
+---+------------------+
3 Comments
Mr.cysl
If I have a df with 1000 columns I cannot hand-write all match functions.. Is there a better way for this situation? Thanks!
rogue-one
depends on what are you doing with each column. if you have to do the same operation on all columns then its simple. if you have to do something unique for each column then you will have to handle each column.
Mr.cysl
What I need to do is calculate is calculate the sum of each column, and replace each data point in the column with (original number/sum). Basically speaking each column is the same.