Assuming that I have the following Spark DataFrame df:
+-----+-------+-------+-------+
| id | col1 | col2 | col3 |
+-----+-------+-------+-------+
| "a" | 10 | 5 | 75 |
| "b" | 20 | 3 | 3 |
| "c" | 30 | 2 | 65 |
+-----+-------+-------+-------+
I want to create a new dataframe new_df that contains:
1) the id of each row
2) the value of the division between col1 / col2 and
3) the value of the division between col3 / col1
The desired output for new_df is:
+-----+-------+-------+
| id | col1_2| col3_1|
+-----+-------+-------+
| "a" | 2 | 7.5 |
| "b" | 6.67 | 0.15 |
| "c" | 15 | 2.17 |
+-----+-------+-------+
I have already tried
new_df = df.select("id").withColumn("col1_2", df["col1"] / df["col2"))
without any luck
selectyou are doing returns a dataframe with only that column, so the subsequentwithColumnoperation will fail becausecol1andcol2are not available. You could switch the order:df.withColumn("col1_2", df["col1"] / df["col2"]).select("id", "col1_2")