How can I count the occurrences of a String in a df Column using Spark partitioned by id?
e.g. Find the value "test" in column "name" of a df
In SQL would be:
SELECT
SUM(CASE WHEN name = 'test' THEN 1 else 0 END) over window AS cnt_test
FROM
mytable
WINDOW window AS (PARTITION BY id)
I've tried using map( v => match { case "test" -> 1.. })
and things like:
def getCount(df: DataFrame): DataFrame = {
val dfCnt = df.agg(
.withColumn("cnt_test",
count(col("name")==lit('test'))
)
Is this an expensive operation? What could be the best approach to check for occurrences of a specific string and then perform an action (sum, max, min, etc)?
thanks