Currently I am doing:
val DF = sqlSession.sql("select itemIdDig as itemId, "
+ "title"
+ "timestamp as time "
+ "from itemTable ")
val tempDF = sqlSession.sql("select itemIdDig as itemId "
+ "from itemTable "
+ "group by itemIdDig HAVING count(*) >= 10 ").rdd.map(r => r(0)).collect()
//keep itemIds which are not in DF
DF.filter(!col("itemId").isin(tempDF : _*)).toDF
But this is very slow. Can someone suggest me any better ways to achieve this? Basically I am looking from rows which is not in tempDF (I tried with group by having which gives me unique itemId but I want to preserve the duplicates)