Try this-
Load the test data provided
val data =
"""
|Col1 Col2 Col3
|True False False
|True True True
|False False True
|False False False
""".stripMargin
val stringDS2 = data.split(System.lineSeparator())
.map(_.split("\\s+").map(_.replaceAll("""^[ \t]+|[ \t]+$""", "")).mkString("|"))
.toSeq.toDS()
val df2 = spark.read
.option("sep", "|")
.option("inferSchema", "true")
.option("header", "true")
.option("nullValue", "null")
.csv(stringDS2)
df2.show(false)
df2.printSchema()
/**
* +-----+-----+-----+
* |Col1 |Col2 |Col3 |
* +-----+-----+-----+
* |true |false|false|
* |true |true |true |
* |false|false|true |
* |false|false|false|
* +-----+-----+-----+
*
* root
* |-- Col1: boolean (nullable = true)
* |-- Col2: boolean (nullable = true)
* |-- Col3: boolean (nullable = true)
*/
Simple way to compute the count for each distinct values in the column
val findCounts = df2.columns.flatMap(c => Seq(col(c), count(c).over(Window.partitionBy(c)).as(s"count_$c")))
df2.select(findCounts: _*).distinct()
.show(false)
/**
* +-----+----------+-----+----------+-----+----------+
* |Col1 |count_Col1|Col2 |count_Col2|Col3 |count_Col3|
* +-----+----------+-----+----------+-----+----------+
* |false|2 |false|3 |false|2 |
* |false|2 |false|3 |true |2 |
* |true |2 |false|3 |false|2 |
* |true |2 |true |1 |true |2 |
* +-----+----------+-----+----------+-----+----------+
*/
If you need in the same format as mentioned, try this
Assuming all the columns in the dataframe have same distinct values
// Assuming all the columns in the dataframe have same distinct values
val columns = df2.columns
val head = columns.head
val zeroDF = df2.groupBy(head).agg(count(head).as(s"${head}_count"))
columns.tail.foldLeft(zeroDF){
(df, c) => df.join(df2.groupBy(c).agg(count(c).as(s"${c}_count")), col(head) === col(c))
}.show(false)
/**
* +-----+----------+-----+----------+-----+----------+
* |Col1 |Col1_count|Col2 |Col2_count|Col3 |Col3_count|
* +-----+----------+-----+----------+-----+----------+
* |false|2 |false|3 |false|2 |
* |true |2 |true |1 |true |2 |
* +-----+----------+-----+----------+-----+----------+
*/
df.schema. See doc.unionthe result dataframes.