0

This is my first post so let me know if I need to give more details.

I am trying to create a boolean column, "immediate", that shows true when at least on of the columns has some data in it. If all are null then the column should be false. I am using the when() .otherwise function in spark but I'm not getting the result I would expect.

Below is the code I'm using:

val evaluation = evaluation_raw
.withColumn("immediate", 
    when(col("intended_outcome_review").isNull 
    && col("outcome").isNull 
    && col("impact").isNull 
    && col("impact_self").isNull 
    && col("next_step").isNull, 
    lit(false))
    .otherwise(lit(true)))
.select(
    col("id"),
    col("intended_outcome_review"),
    col("outcome"),
    col("impact"),
    col("impact_self"),
    col("next_step"),
    col("immediate"))

Desired outcome:

+--------+------------------------+-------------+-------+------------+----------+----------+
|id      |intended_outcome_review |outcome      |impact |impact_self |next_step |immediate |
+--------+------------------------+-------------+-------+------------+----------+----------+
|1568    |null                    |null         |4      |3           |null      |true      |
|1569    |null                    |null         |null   |null        |null      |false     |
|1570    |null                    |null         |null   |null        |null      |false     |
|1571    |1                       |improved coms|3      |3           |email prof|true      |
+--------+------------------------+-------------+-------+------------+----------+----------+

Actual outcome:

+--------+------------------------+-------------+-------+------------+----------+----------+
|id      |intended_outcome_review |outcome      |impact |impact_self |next_step |immediate |
+--------+------------------------+-------------+-------+------------+----------+----------+
|1568    |null                    |null         |4      |3           |null      |true      |
|1569    |null                    |null         |null   |null        |null      |true      |
|1570    |null                    |null         |null   |null        |null      |false     |
|1571    |1                       |improved coms|3      |3           |email prof|true      |
+--------+------------------------+-------------+-------+------------+----------+----------+

If anyone knows what I may be doing wrong please let me know. Thanks!

2
  • 1
    The logic itself looks correct, are you sure that it is really null, and not "null" in some columns? Commented Oct 18, 2022 at 9:21
  • @RobertKossendey Thanks for pointing that out. I looked into it and it turns out some of the columns are converted from Null to "" when other parts of the form are filled out. Commented Oct 18, 2022 at 10:27

2 Answers 2

1

You can use a trick and cast column.isNull() to in int and calculate sum of them. if the sum is above 0 then it's true.

.withColumn(
    'immediate',
    (
        F.col('intended_outcome_review').isNull().cast('int') +
        F.col('outcome').isNull().cast('int') +
        F.col('impact').isNull().cast('int') +
        F.col('next_step').isNull().cast('int')
    ) != 0
)
Sign up to request clarification or add additional context in comments.

Comments

0

Turns out some of the columns are converted from Null to "" when other parts of the form are filled out.

Answer below considers empty strings and Null values:

.withColumn("immediate", 
    when((col("intended_outcome_review").isNull || col("intended_outcome_review") ==="")
    && (col("outcome").isNull || col("outcome") === "")
    && (col("impact").isNull || col("outcome") === "")
    && (col("impact_self").isNull || col("impact_self") === "")
    && (col("next_step").isNull || col("next_step") === ""),
    lit(false))
    .otherwise(lit(true)))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.