I am trying to normalize a column in SPARK DataFrame using python.
My dataset:
--------------------------
userID|Name|Revenue|No.of.Days|
--------------------------
1 A 12560 45
2 B 2312890 90
. . . .
. . . .
. . . .
--------------------------
In this dataset, except the userID and Name, I have to normalize the Revenue and No.of Days.
The output should look like this
userID|Name|Revenue|No.of.Days|
--------------------------
1 A 0.5 0.5
2 B 0.9 1
. . 1 0.4
. . 0.6 .
. . . .
--------------------------
The formula used to calculate or normalizing the values in each column is
val = (ei-min)/(max-min)
ei = column value at i th position
min = min value in that column
max = max value in that column
How can I do this in easy steps using PySpark?
