1

I don't know if it is possible, but I'd like in my mapPartitions to split in two lists the variable "a". Like here to have a list l that stores all numbers and an other list let's say b that stores all words. with something like a.mapPartitions((p,v) =>{ val l = p.toList; val b = v.toList; ....}

With for example in my for loop l(i)=1 and b(i) ="score"

import scala.io.Source
import org.apache.spark.rdd.RDD
import scala.collection.mutable.ListBuffer

val a = sc.parallelize(List(("score",1),("chicken",2),("magnacarta",2)) )

a.mapPartitions(p =>{val l = p.toList;
    val ret = new ListBuffer[Int]
    val words = new ListBuffer[String]
    for(i<-0 to l.length-1){
    words+= b(i)
    ret += l(i) 
    }
ret.toList.iterator
}
)
1
  • a.map(_._1); a.map(_._2)? Commented Mar 17, 2016 at 16:38

1 Answer 1

1

Spark is a distributed computing engine. you can perform operation on partitioned data across nodes of the cluster. Then you need a Reduce() method that performs a summary operation.

Please see this code that should do what you want:

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf

object SimpleApp {

  class MyResponseObj(var numbers: List[Int] = List[Int](), var words: List[String] = List[String]()) extends java.io.Serializable{
    def +=(str: String, int: Int) = {
      numbers = numbers :+ int
      words = words :+ str
      this
    }

    def +=(other: MyResponseObj) = {
      numbers = numbers ++ other.numbers
      words = words ++ other.words
      this
    }

  }


  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Simple Application").setMaster("local[2]")
    val sc = new SparkContext(conf)
    val a = sc.parallelize(List(("score", 1), ("chicken", 2), ("magnacarta", 2)))

    val myResponseObj = a.mapPartitions[MyResponseObj](it => {
      var myResponseObj = new MyResponseObj()
      it.foreach {
        case (str :String, int :Int) => myResponseObj += (str, int)
        case _ => println("unexpected data")
      }
      Iterator(myResponseObj)
    }).reduce( (myResponseObj1, myResponseObj2) => myResponseObj1 += myResponseObj2 )

    println(myResponseObj.words)
    println(myResponseObj.numbers)

  }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, nice solution but I don't know how to launch it ? I tried with :load test.scala ( test .scala = name of my file) But it only displays that everything is well loaded (I code with the terminal in Windows and NotePad++)
I would recommend to download an ide such intellij or eclipse and install a scala plugin .

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.