I have this two case class :
case class Doc(posts: Seq[Post], test: String)
case class Post(postId: Int, createdTime: Long)
I create a sample df :
val df = spark.sparkContext.parallelize(Seq(
Doc(Seq(
Post(1, 1),
Post(2, 3),
Post(3, 8),
Post(4, 15)
), null),
Doc(Seq(
Post(5, 6),
Post(6, 9),
Post(7, 12),
Post(8, 20)
), "hello") )).toDF()
So what i want is , return online Doc with posts where createTime is between x et y . For example, for x = 2 et y = 9, i want this result with the same schema of the origin df :
+--------------+
| posts|
+--------------+
|[[2,3], [3,8]]|
|[[5,6], [6,9]]|
+--------------+
So i tried lot of combination of where, but i doesn't work.
I tried to use map(_.filter(...)), but the problem i don't want to do toDF().as[Doc]
Any help ? Thank you