I have this function:
def countNullValueColumn(df: DataFrame): Array[(String, Long)] =
df.columns
.map(x => (x, df.filter(df(x).isNull || df(x) === "" || df(x).isNan).count))
I'm trying to use an val counter = sc.longAccumulator instead a dataframe count function, without success.
The attempts I've made have been:
df.columns.foreach(x => {df.filter(df(x).isNull || df(x) === "" || df(x).isNaN) {counter.add(1)} (x, counter.value)})
df.columns.foreach(x => {df.filter(df(x).isNull || df(x) === "" || df(x).isNaN) {counter.add(1); (x, counter.value)} })
Unfortunately none of these work because it doesn't return the correct type (Array[(String, Long)]).
Does anyone have any ideas or suggestions? Thanks in advance
P.s. I don't know if using the accumulator is more efficient than the count, but I would just like to try.
Edit: Should I use a foreach instead of a map to not have a wrong value in the accumulator? Since the map is a transformation, while foreach is an action
Edit2: As suggested by @DNA I changed the map to foreach inside my code.
Edit3: Ok, now the problem has become trying to create an Array[(String, Long)]. I tried this, but the :+ operator doesn't work.
val counter = session.sparkContext.longAccumulator
val res: Array[(String, Long)] = Array()
df.columns
.foreach(x => res :+ (x, df.filter{ df(x).isNull || df(x) === "" || df(x).isNaN {counter.add(1); counter.value}}))
Does anyone have any ideas or suggestions?