24

Can Someone help me to get understand the below error, I'm a newbie to PySpark, started learning.

When I googled it, the below error occurs, when we compare different types of data types, I did have column called salary as an Integer column? Why am I still getting this error.

>>> df.printSchema()
root
 |-- Firstname: string (nullable = true)
 |-- middlename: string (nullable = true)
 |-- lastname: string (nullable = true)
 |-- dob: string (nullable = true)
 |-- sex: string (nullable = true)
 |-- salary: integer (nullable = true)
 |-- CopiedColumn: integer (nullable = true)
 |-- Country: string (nullable = false)
 |-- anotherColumn: string (nullable = false)

>>> df.show()
+---------+----------+--------+----------+---+------+------------+-------+-------------+
|Firstname|middlename|lastname|       dob|sex|salary|CopiedColumn|Country|anotherColumn|
+---------+----------+--------+----------+---+------+------------+-------+-------------+
|    James|          |   Smith|1991-04-01|  M|300000|     -300000|  India|Another value|
|  Michael|      Rose|        |2000-05-19|  M|400000|     -400000|  India|Another value|
|   Robert|          |Williams|1978-09-05|  M|400000|     -400000|  India|Another value|
|    Maria|      Anne|   Jones|1967-12-01|  F|400000|     -400000|  India|Another value|
|      Jen|      Mary|   Brown|1980-02-17|  F|  -100|         100|  India|Another value|
+---------+----------+--------+----------+---+------+------------+-------+-------------+


>>> df.withColumn("lit_value2", when(col("salary") >=400000 & col("salary") <= 500000,lit("100")).otherwise(lit("200"))).show()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/mapr/spark/spark/python/pyspark/sql/column.py", line 115, in _
    njc = getattr(self._jc, name)(jc)
  File "/opt/mapr/spark/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/opt/mapr/spark/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/opt/mapr/spark/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 332, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o138.and. Trace:
py4j.Py4JException: Method and([class java.lang.Integer]) does not exist
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
        at py4j.Gateway.invoke(Gateway.java:274)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
1
  • That's really a tricky one. The error message is rather cryptic. kudos to you Commented Jan 8, 2022 at 20:26

2 Answers 2

44

You need to wrap the conditions in parentheses:

when((col("salary") >= 400000) & (col("salary") <= 500000), lit("100"))

Otherwise your condition will be interpreted as below, due to operator precedence - & is higher than >=.

col("salary") >= (400000 & col("salary")) <= 500000

which does not make sense and gives the error you got.

Sign up to request clarification or add additional context in comments.

1 Comment

To expand on @mck's answer, the problem being faced here is operator precedence, The conditional operators (or/and/not) have a higher precedence than comparison operators (==, !=, >, < etc). Where the or conditional is the pipe character and the and conditional operator is ampersand. The solution is very simple, always use parenthesis around comparisons. cumsum.wordpress.com/2020/01/10/…
3

For posterity, you can also get a similar error if you pass non-Column objects in the expression. For example:

column = 'A'
df.select(df[column] == 0)  # this is fine

column = ['A']              # whoops, df[['A']] results in a DataFrame, not a Column
df.select(df[column] == 0)  # py4j.Py4JException: Method col([class java.lang.Boolean]) does not exist

1 Comment

In my case I forgot to wrap my "column_name" with col.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.