10

I am trying to add a column to DataFrame depending on whether column value is in another column as follow:

df = df.withColumn('new_column', when(df['color']=='blue' | df['color']=='green', 'A').otherwise('WD'))

after running the code I obtain the following error:

Py4JError: An error occurred while calling o59.or. Trace:
py4j.Py4JException: Method or([class java.lang.String]) does not exist
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
    at py4j.Gateway.invoke(Gateway.java:274)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)

what shall I do to overcome this issue? I am using PySpark 2.3.0

2 Answers 2

24

While using multiple conditions, each condition needs to be separated because of operator precedence.

df=df.withColumn('new_column',when((df['color']=='blue')|(df['color']=='green'),'A').otherwise('WD'))
Sign up to request clarification or add additional context in comments.

Comments

3

I discovered a similar problem with Py4JError: An error occurred while calling o166.and. Trace: and the logs pointed to this line of my code:

.when(df.line_item_line_item_type == 'Fee' & df.reservation_reservation_a_r_n != '', 0)

adding sets of parenthesis around the statements to each side of the "&" solved the problem, like so:

.when((df.line_item_line_item_type == 'Fee') & (df.reservation_reservation_a_r_n != ''), 0)

1 Comment

A duplicate of Suresh's answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.