1

How can I loop through a Spark data frame? I have a data frame that consists of:

time, id, direction
10, 4, True  //here 4 enters --> (4,)
20, 5, True  //here 5 enters --> (4,5)
34, 5, False //here 5 leaves --> (4,)
67, 6, True  //here 6 enters --> (4,6)
78, 6, False //here 6 leaves --> (4,)
99, 4, False //here 4 leaves --> ()

it is sorted by time and now I would like to step through and accumulate the valid ids. The ids enter on direction==True and exit on direction==False

so the resulting RDD should look like this

time, valid_ids
(10, (4,))
(20, (4,5))
(34, (4,))
(67, (4,6))
(78, (4,)
(99, ())

I know that this will not parallelize, but the df is not that big. So how could this be done in Spark/Scala?

0

3 Answers 3

4

If data is small ("but the df is not that big") I'd just collect and process using Scala collections. If types are as shown below:

df.printSchema
root
 |-- time: integer (nullable = false)
 |-- id: integer (nullable = false)
 |-- direction: boolean (nullable = false)

you can collect:

val data = df.as[(Int, Int, Boolean)].collect.toSeq

and scanLeft:

val result = data.scanLeft((-1, Set[Int]())){ 
  case ((_, acc), (time, value, true)) => (time, acc + value)
  case ((_, acc), (time, value, false))  => (time, acc - value)
}.tail
Sign up to request clarification or add additional context in comments.

Comments

1

Use of var is not recommended for scala developers but still I am posting answer using var

var collectArray = Array.empty[Int]
df.rdd.collect().map(row => {
  if(row(2).toString.equalsIgnoreCase("true")) collectArray = collectArray :+ row(1).asInstanceOf[Int]
  else collectArray = collectArray.drop(1)
  (row(0), collectArray.toList)
})

this should give you result as

(10,List(4))
(20,List(4, 5))
(34,List(5))
(67,List(5, 6))
(78,List(6))
(99,List())

Comments

0

Suppose the name of the respective data frame is someDF, then do:

val df1 = someDF.rdd.collect.iterator;
   while(df1.hasNext) 
   {
       println(df1.next);
   }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.