1

I am trying to modify a column value in PySpark dataframe as follow:

df_cleaned = df_cleaned.withColumn('brand_c', when(df_cleaned['brand'] == "samsung" |\
                                                   df_cleaned['brand'] == "oppo", df_cleaned.brand)\
                                   .otherwise('others'))

This generates the following exception:

An error occurred while calling o435.or. Trace: py4j.Py4JException: Method or([class java.lang.String]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)

Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/column.py", line 115, in _ njc = getattr(self._jc, name)(jc) File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call answer, self.gateway_client, self.target_id, self.name) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 332, in get_return_value format(target_id, ".", name, value)) py4j.protocol.Py4JError: An error occurred while calling o435.or. Trace: py4j.Py4JException: Method or([class java.lang.String]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748)

1 Answer 1

4

You are just missing a couple of brackets. Try:

df_cleaned = df.withColumn('brand_c', when((df['Product'] == "apple") |\
                (df['Product'] == "oppo"), df.User).otherwise('others'))

Always use parenthesis while using comparison operators in pyspark.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.