Pyspark: Compare column value with another value

Question

I have the following data frame:

+----+----+----+----+
|col0|col1|col2|col3|
+----+----+----+----+
|   1|  21|   3|null|
|   4|   5|  23|null|
|null|   4|   5|   6|
|null|   9|  22|  42|
+----+----+----+----+

I tried computing the minimum of the column 'col1' and 1.5:

import pyspark.sql.functions as F

cond = df['col2'] > 10
df = df.withColumn('new_col', F.when(cond, F.least(F.col('col1')*0.2, 1.5)).otherwise(F.lit(100)))
df.show()

But I got the following exception:

TypeError: Invalid argument, not a string or column: 1.5 of type <class 'float'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.

mck · Accepted Answer · 2021-04-06 07:14:42Z

4

Use F.lit(1.5) inside F.least, because it requires a column and does not accept a float:

df2 = df.withColumn('new_col', F.when(cond, F.least(F.col('col1')*0.2, F.lit(1.5))).otherwise(F.lit(100)))

answered Apr 6, 2021 at 7:14

mck

42.7k13 gold badges44 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pyspark: Compare column value with another value

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related