I have a array column on which i find text from it and form a dataframe. Which is the better way among the below 2 options? Option 1
val texts = Seq("text1", "text2", "text3")
val df = mainDf.select(col("*"))
.withColumn("temptext", explode($"textCol"))
.where($"temptext".isin(texts: _*))
And since it has added and extra column "temptext" and increased duplicate rows by exploding
val tempDf = df.drop("temptext").dropDuplicates("Root.Id") // dropDuplicates does not work since I have passed nested field
vs
Option 2
val df = mainDf.select(col("*"))
.where(array_contains($"textCol", "text1") ||
array_contains($"textCol", "text2") ||
array_contains($"textCol", "text3"))
Actually I wanted to make a generic api, If I go with option 2
then the problem is for every new text i need to add array_contains($"textCol", "text4") and create new api every time
and in option 1 it creates duplicate rows since I explode the array and also needs to drop the temporary column
array_containscheck herearray_containsyou can used any method mentioned in answers.