1

I have table name "data" which having 5 columns and each column contain some null values. i want to take a count of each column's null value how can i write code for that result! its easy to take count of one column but how can i write code for counting each column of table.

sample :

+----------------+----------------+--------+---------+-------------+
| 2              |3               |4       |  5      |6            |
+----------------+----------------+--------+---------+-------------+
|null             |1               | null   |null     |null         |
|null             |null            | null   |null     |asdc         |
|null             |23              | 23     |null     |null         |
|null             |null            | null   |23       |41           |
|24               |3               | 35     |null     |null         |
|null             |null            | null   | 1       |wef          |
|null             |32              | 54     |null     |45           |
|null             |null            | null   |123      |null         |
|w411             |31              | 12     |null     |null         |
|null             |null            | null   |11       |null         |
+----------------+----------------+--------+---------+-------------+

how take null count of each column

I have 40 tables which contain 5 or 6 or 10 columns and each column contain some null values i just want to take null count of each column of tables which is the best way to take null count!

Thanks in advance!

7
  • Can you provide us an output example? What did you try? Commented Sep 5, 2018 at 10:50
  • i have tried for one column only df.col("column name").isNull.count() Commented Sep 5, 2018 at 10:55
  • 1
    you can use foldLeft to iterate over the columns Commented Sep 5, 2018 at 11:09
  • @BeyhanGül could you give me an example of foldLeft function please! Commented Sep 5, 2018 at 11:23
  • It would be good if you describe your task clearly. There are a few methods how to do this which can be applied in different situations. Commented Sep 5, 2018 at 11:30

2 Answers 2

8

If you don't want to drop empty rows/columns and you don't need to do any additional calculations in you job, this should work for your:

 df.select(df.columns.map(colName => {
    count(when(col(colName).isNull, true)) as s"${colName}_nulls_count"
  }): _*)
  .show(10) // or save result somewhere
Sign up to request clarification or add additional context in comments.

1 Comment

okay, but what if i have 100 of columns then how can i take null count?
0

okay, but what if i have 100 of columns then how can i take null count? – Shree Batale

val myTableDF = Seq(
  (1, 100, 0, 0, 0, 0, 0),
  (2, 0, 50, 0, 0, 20, 0),
  (3, 0, 0, 0, 0, 0, 0),
  (4, 0, 0, 0, 0, 0, 0)
).toDF("column1", "column2", "column3", "column4", "column5", "column6", "column7")

table



val inputDF = myTableDF

println("Total " + inputDF.count() + " rows in the input DataFrame\n")

val countsDF = inputDF.select(inputDF.columns.map(c => count(when(col(c).isNull or col(c) === 0, c)).alias(c)): _*)
            .withColumn("pivot", lit("Nulls and 0s count"))
            .cache()

counts




val kv = explode(array(countsDF.columns.dropRight(1).map { 
  c => struct(lit(c).alias("k"), col(c).alias("v")) 
}: _*))

val countsTransposedDF = countsDF
  .withColumn("kv", kv)
  .select($"pivot", $"kv.k", $"kv.v")
  .groupBy($"k")
  .pivot("pivot")
  .agg(first($"v"))
  .withColumnRenamed("k", "Column Name")

countsTransposedDF.show(100, false)

transposed

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.