I'm new to Pyspark and I have a problem to solve.
I have a dataframe with 4 columns, being customers, person, is_online_store and count:
| customer | PersonId | is_online_store | count |
|---|---|---|---|
| afabd2d2 | 4 | true | 1 |
| afabd2d2 | 8 | true | 2 |
| afabd2d2 | 3 | true | 1 |
| afabd2d2 | 2 | false | 1 |
| afabd2d2 | 4 | false | 1 |
I need to create according to the following rules:
- If
PersonIdcount(column) has 1 or + Trues and None False thenOnline - If
PersonIdcount(column) has 1 or + False and None True thenoffline - If the
PersonIdcount(column) has at least 1 False AND 1 True thenHybrid
How it should look:
| customer | PersonId | New_Column |
|---|---|---|
| afabd2d2 | 4 | Hybrid |
| afabd2d2 | 8 | Online |
| afabd2d2 | 3 | Online |
| afabd2d2 | 2 | Offline |