1

I'm new to Pyspark and I have a problem to solve.

I have a dataframe with 4 columns, being customers, person, is_online_store and count:

customer PersonId is_online_store count
afabd2d2 4 true 1
afabd2d2 8 true 2
afabd2d2 3 true 1
afabd2d2 2 false 1
afabd2d2 4 false 1

I need to create according to the following rules:

  • If PersonId count(column) has 1 or + Trues and None False then Online
  • If PersonId count(column) has 1 or + False and None True then offline
  • If the PersonId count(column) has at least 1 False AND 1 True then Hybrid

How it should look:

customer PersonId New_Column
afabd2d2 4 Hybrid
afabd2d2 8 Online
afabd2d2 3 Online
afabd2d2 2 Offline

1 Answer 1

1

Group by customer + personId and use when expression to check if all values in is_online_store column are true / false or a mix of the 2, using for example bool_and function:

from pyspark.sql import functions as F

df1 = df.groupBy("customer", "PersonId").agg(
    F.when(F.expr("bool_and(is_online_store)"), "Online")
     .when(F.expr("bool_and(!is_online_store)"), "Offline")
     .otherwise("Hybrid").alias("New_Column")
)

df1.show()
#+--------+--------+----------+
#|customer|PersonId|New_Column|
#+--------+--------+----------+
#|afabd2d2|       2|   Offline|
#|afabd2d2|       8|    Online|
#|afabd2d2|       4|    Hybrid|
#|afabd2d2|       3|    Online|
#+--------+--------+----------+
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the help! It worked, I just had to convert the column to Boolean before.
"When I went up to production, it started giving this error: Undefined function: 'bool_and'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 0"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.