1

I have a DataFrame in which I would like to get the total null values count and I have the following that does this generically on all the columns:

First my DataFrame that just contains one column (for simplicity):

val recVacDate = dfRaw.select("STATE")

When I print using a simple filter, I get to see the following:

val filtered = recVacDate.filter("STATE is null")
println(filtered.count()) // Prints 94051

But when I use this code below, I get just 1 as a result and I do not understand why?

val nullCount = recVacDate.select(recVacDate.columns.map(c => count(col(c).isNull || col(c) === "" || col(c).isNaN).alias(c)): _*) 
println(nullCount.count()) // Prints 1

Any ideas as to what is wrong with the nullCount? The DataType of the column is a String.

1 Answer 1

1

This kind of fixed it:

df.select(df.columns.map(c => count(when(col(c).isNull || col(c) === "" || col(c).isNaN, c)).alias(c)): _*)

Notice the use of when clause after the count.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.