how to take count of null values from table using spark-scala?

Question

I have table name "data" which having 5 columns and each column contain some null values. i want to take a count of each column's null value how can i write code for that result! its easy to take count of one column but how can i write code for counting each column of table.

sample :

+----------------+----------------+--------+---------+-------------+
| 2              |3               |4       |  5      |6            |
+----------------+----------------+--------+---------+-------------+
|null             |1               | null   |null     |null         |
|null             |null            | null   |null     |asdc         |
|null             |23              | 23     |null     |null         |
|null             |null            | null   |23       |41           |
|24               |3               | 35     |null     |null         |
|null             |null            | null   | 1       |wef          |
|null             |32              | 54     |null     |45           |
|null             |null            | null   |123      |null         |
|w411             |31              | 12     |null     |null         |
|null             |null            | null   |11       |null         |
+----------------+----------------+--------+---------+-------------+

how take null count of each column

I have 40 tables which contain 5 or 6 or 10 columns and each column contain some null values i just want to take null count of each column of tables which is the best way to take null count!

Thanks in advance!

i have tried for one column only df.col("column name").isNull.count() — Shree Batale
– Shree Batale, Commented Sep 5, 2018 at 10:55
@BeyhanGül could you give me an example of foldLeft function please! — Shree Batale
– Shree Batale, Commented Sep 5, 2018 at 11:23
It would be good if you describe your task clearly. There are a few methods how to do this which can be applied in different situations. — Avseiytsev Dmitriy
– Avseiytsev Dmitriy, Commented Sep 5, 2018 at 11:30

Avseiytsev Dmitriy · Accepted Answer · 2018-09-05 12:35:00Z

8

If you don't want to drop empty rows/columns and you don't need to do any additional calculations in you job, this should work for your:

 df.select(df.columns.map(colName => {
    count(when(col(colName).isNull, true)) as s"${colName}_nulls_count"
  }): _*)
  .show(10) // or save result somewhere

edited Sep 5, 2018 at 12:35

answered Sep 5, 2018 at 11:28

Avseiytsev Dmitriy

1,17011 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Shree Batale Over a year ago

okay, but what if i have 100 of columns then how can i take null count?

Alexey Popkov · Accepted Answer · 2022-12-19 08:20:57Z

okay, but what if i have 100 of columns then how can i take null count? – Shree Batale

val myTableDF = Seq(
  (1, 100, 0, 0, 0, 0, 0),
  (2, 0, 50, 0, 0, 20, 0),
  (3, 0, 0, 0, 0, 0, 0),
  (4, 0, 0, 0, 0, 0, 0)
).toDF("column1", "column2", "column3", "column4", "column5", "column6", "column7")

val inputDF = myTableDF

println("Total " + inputDF.count() + " rows in the input DataFrame\n")

val countsDF = inputDF.select(inputDF.columns.map(c => count(when(col(c).isNull or col(c) === 0, c)).alias(c)): _*)
            .withColumn("pivot", lit("Nulls and 0s count"))
            .cache()

https://stackoverflow.com/a/40899388

val kv = explode(array(countsDF.columns.dropRight(1).map { 
  c => struct(lit(c).alias("k"), col(c).alias("v")) 
}: _*))

val countsTransposedDF = countsDF
  .withColumn("kv", kv)
  .select($"pivot", $"kv.k", $"kv.v")
  .groupBy($"k")
  .pivot("pivot")
  .agg(first($"v"))
  .withColumnRenamed("k", "Column Name")

countsTransposedDF.show(100, false)

Collectives™ on Stack Overflow

how to take count of null values from table using spark-scala?

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related