1

I have a RDD with structures of RDD:[String A,List(Strings Bs)] I would like to map it so that I get an RDD:[String A,String B], so that each element in the List will be matched with String A. What would be the most efficient way to do this?

I am currently using flatMapValues, would this be the most efficient way? (I have huge dataset)

3
  • Looks like you are using Java API. Please specify what API are you using. Commented Apr 23, 2015 at 10:43
  • And yes... flatMapValues is one of the best ways for such a thing. Commented Apr 23, 2015 at 10:43
  • Apart from flatMapValues, did you get that RDD from a co-group for instance? If so, using a join instead will produce what you want without this intermediate RDD. Commented Apr 23, 2015 at 13:09

1 Answer 1

1

rdd.flatMapValues(identity) should get the job done.

That should be a pretty efficient and simple way. To optimize performance, you could compare it to an implementation using mapPartitions and pick the better of the two. I wouldn't expect a huge difference as in both cases wrapper objects need to be created anyway.

rdd.mapPartitions(iter => iter.flatMap(elem => elem._2.map(v => (elem._1,v)))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.