1

I am trying to update an Array[Array[String]] type when a map function is called on list.

var m : Array[Array[String]] = Array(Array(""))
var n : List((String, Double))
n contains 100 elements
i want to update m foreach element in n like
n.map(x => {
   m :+ Array("a","b","c");
   x
})

but I am ending up with empty array. Values are not appending to m. I want to try it for an RDD as well.

2
  • 1
    Welcome to SO. It'd be nice to ask atomic questions with a better formatted code, that way people can help you better. See this for formatting help. I would drop the part on Spark first and ask it later as a separate question if needed. Are you familiar with mutability and immutability concepts and how Arrays and Lists are different in Scala? Commented Nov 17, 2018 at 4:51
  • Hi Nader Ghanbari. Thanks for the information provided. I am aware of those concepts of Scala Commented Nov 17, 2018 at 18:10

1 Answer 1

1

First, the idea of map is to transform the collection on which map is being called, not mutating other collection - foreach would be better.

Second, the reason why you end up with an empty array is because :+ creates a copy with the element added, it does not mutate the original array - scaladoc.

Third, arrays aren't the best collections for incremental building, because is not efficient to resize them, you can create a big array first and then updating every position by index (however, that would be very imperative), or you could use an ArrayBuffer scaladoc, or you could map the list and end up with a List[Array[A]] or a List[List[A]] (that would be the most functional way of doing it).

Fourth, what exactly you mean with trying it with and RDD as well... if you want to build an array by iterating a RDD, that's conceptually wrong. Or, if you want to build a RDD by iterating a List, that's conceptually wrong too.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you for the reply. I need to create a collection of items for every iteration of a transformation and that collection is global to the application, shared by other transformations. A sequence of transformations will update the collection for every iteration on an RDD. Can I use another RDD as the global collection? How can I achieve this solution?
I get it when you’re saying, conceptually wrong. But is there any other good approach for that problem?
@karteekkadari, without some context is hard to tell what could be a good solution. Also, the reason why I think it is conceptually wrong is that, if you want to update elements on a global array by "iterating" an RDD, then I'm assuming you have a correspondence of 1 to 1 on the number of elements, but that means your driver must have enough memory to hold a local array with the same number of elements than a Distributed Dataset that has to be splitted across the RAM of many machines - that doesn't make sense, so probably a RDD would be better...
...but, again you can't create a RDD by appending/prepending/adding elements one by one, but instead they are intended to be created as the result of a transformation on another RDD. The problem is that your description of what you need is too abstract, so I'm not sure what could be a good solution for it.
That make sense. Sometimes it’s difficult to think broader. For my case, eventually, the collection is of small size. Anyway, I want to try accumulatorsv2 for this. Thanks for your concern

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.