Efficient way of synchronize ArrayList in java when you have to process it parellel

Question

I have collection of list and I have to iterate on each list element and put it into a another list.The data is very huge so I need to process it parallel so that I can get good processing time.Also I need to preserve the order of lists.I have lost element from list when I am using it as mentioned or sometime getting NULL.What will we efficient way of making list synchronize or thread safe.

 java.util.List<T> metadata = new ArrayList<T>();
sourceValuesIterable.parallelStream().forEach(tblRow ->
{
    metadata.add();
});

One more question: When you remove the NULL from collection using Guava's Predicates does it change the order of list element?

Thanks in advance.

Why not use map and collect - sourceValuesIterable.parallelStream().map(...).collect(Collectors.toList());? — KunLun
– KunLun, Commented Jul 22, 2020 at 12:52
If the data really is huge, I would do everything possible to avoid making a copy. What do you need the copy for, maybe there's a way to remove the need for it? — Joni
– Joni, Commented Jul 22, 2020 at 13:11
@joni are you talking about why I am adding sVI data to metadata.Well this contains raw data and I need the specific data from it. — user_a27
– user_a27, Commented Jul 22, 2020 at 14:34
So you start with a large list of composite objects, and you want to extract a specific component object from each composite, and create a new list from them? — Joni
– Joni, Commented Jul 22, 2020 at 15:28
@Joni I did not know much about composite object. Also with arraylist the only concern is about thread safety and for that we have a lot of things there to achieve this. — user_a27
– user_a27, Commented Jul 22, 2020 at 16:17

rzwitserloot · Accepted Answer · 2020-07-22 12:52:56Z

1

Parallelism requires a single 'stream pipeline' if you want to stand any chance of order being preserved. Fortunately, you can do that here: map your sVI to Ts, then turn the stream into a list by collecting it:

List<T> metadata = sVI.parallelStream()
    .map(tblRow -> new ThingieThatGoesInMetadata())
    .collect(Collectors.toList());

Start there; this way, the ordering is guaranteed.

answered Jul 22, 2020 at 12:52

rzwitserloot

107k6 gold badges74 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Michael Over a year ago

Collectors.toList doesn't pre-size the arraylist so for a huge list, this is going to go through many iterations of array copies in order to expand to the necessary capacity

user_a27 Over a year ago

@rzwitserloot's what is .map do there and does it has any limitations ?

Michael · Accepted Answer · 2020-07-22 13:03:40Z

I think it's a mistake to assume that parallelising this task and adding elements one at a time to the new list is automatically going to be the fastest way to copy it.

For starters, you didn't pre-size the new ArrayList, so it's going to continually be resizing as you add elements in order to reach the necessary capacity.

There is also an overhead associated with spinning up a parallel stream and with merging the results.

ArrayList already has a copy constructor which will do an efficient copy. Ultimately, that's just going to be copying the underlying array of references. It's hard to imagine being able to beat that kind of low-level operation for performance.

As always with performance-related concerns, your best bet is to profile it, measure the results, and use data to inform your decisions.

Collectives™ on Stack Overflow

Efficient way of synchronize ArrayList in java when you have to process it parellel

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related