I'm not sure what you are trying to do with the while loop. Anyway, you can check with the REPL that the expression you use as a condition is a Column and not a Boolean, hence the Exception.
> size(col("categoriesRaw")) !== 0
res1: org.apache.spark.sql.Column = (NOT (size(categoriesRaw) = 0))
Basically, this is an expression that needs to be evaluated by SparkSQL within a where, select or any other function that uses Columns.
Nevertheless, with your spark code you are almost there, you just need to add a groupBy to get where you want. Let's start by creating your data.
import spark.implicits._
val users = Seq( "user 1" -> Map("home & personal items > interior" -> 1,
"vehicles > cars" -> 1),
"user 2" -> Map("vehicles > cars" -> 3))
val df = users.toDF("user", "categoriesRaw")
Then, you don't need a while loop to iterate over all the values of the maps. explode does exactly that for you:
val explodedDf = df.select( explode('categoriesRaw) )
explodedDf.show(false)
+--------------------------------+-----+
|key |value|
+--------------------------------+-----+
|home & personal items > interior|1 |
|vehicles > cars |1 |
|vehicles > cars |3 |
+--------------------------------+-----+
Finally, you can use groupBy add get what you want.
explodedDf
.select('key as "categ", 'value as "number_of_events")
.groupBy("categ")
.agg(count('*), sum('number_of_events))
.show(false)
+--------------------------------+--------+---------------------+
|categ |count(1)|sum(number_of_events)|
+--------------------------------+--------+---------------------+
|home & personal items > interior|1 |1 |
|vehicles > cars |2 |4 |
+--------------------------------+--------+---------------------+
NB: I was not sure if you wanted to count the sessions (1st column) or the events (2nd column) so I computed both.