4

in the last days I'm starting to "play" with some Java 8 features, like stream (I studied a bit of documentation and several examples).

In my application I have a Map and I need to get the three element with highest value (the float part).

I tried different modifications to my code (and some of these solutions also: Sort a Map<Key, Value> by values (Java) ), for example:

Map<Long, Float> great = createMapWith20Elements();
Map<Long, Float> small = great.entrySet().stream()
        .sorted(Map.Entry.<Long, Float>comparingByValue().reversed()) 
        .limit(3) 
        .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

But the reslt is always the same: sometimes the code works fine, other it gives me a

java.lang.ArrayIndexOutOfBoundsException: 19

In rare cases, the index out of the bounds is 18.

This "random" behaviour (18, 19, or correct elaborations) makes me think to a "parallel threading" problem.

I'm sure that great map has always 20 elements... if I print them I receive:

2,-0.5
3,0.0
4,0.0
5,0.0
6,0.0
7,-0.33333334
8,0.0
9,0.0
10,0.0
11,0.0
12,0.5
13,0.0
14,0.0
15,-0.5
18,0.0
19,0.0
21,0.0
22,0.0
23,0.0
24,0.0

I'm conscious that 17 objects are candidate to be the first 3... but it is not a problem for my algorithm.

Can you help me in some way?

Thanks

EDIT:

The method createMapWith20Elements() has a dummy name for better explaining my situation: I'm sure it returns 20 elements because it makes a DB reading... but it should return any matching record.

By the way it ends with

// myIds is an ArrayList<Long>
myIds.parallelStream().forEach(e -> trust.put(e, 0f));
return trust;

Replacing with myIds.stream() it seems working fine... I'm not able to figure how using parallelStream to write to an object (Collection and not Stream), and returning the object itself (Collection), in the calling function it can lead to this kind of problem.

11
  • 3
    since you have multiple method calls chained, why not unchain them and see the results of each call before the next one? A debugger should tell you exactly where your map has less than 20, 19, 18, 17 elements... Commented Jun 23, 2015 at 22:10
  • 3
    Would you please post a reproducible example, it would be much easier for us to help you. Filling a map with the data you gave using your code work fine for me. Commented Jun 23, 2015 at 23:24
  • 4
    Can you post the stack trace please? Commented Jun 24, 2015 at 1:53
  • 2
    Aside: a slightly easier way to sort the entries is sorted(comparing(Map.Entry::getValue, reverseOrder())) Commented Jun 24, 2015 at 2:05
  • 2
    If you say parallelStream().forEach(…) you really perform the action multi-threaded. So if you call trust.put in that action and trust is not a thread safe Map it’s possible that you corrupt its data structure. Commented Jun 24, 2015 at 9:20

2 Answers 2

7

I think that the problem is the method createMapWith20Elements().

You are inserting elements in the map (probably a HashMap or a TreeMap) concurrently and both HashMap and TreeMap are not synchronized. So concurrent insertions (put method invocations) break the map structure (you get a corrupted map).

As you mention:

// myIds is an ArrayList<Long>
myIds.parallelStream().forEach(e -> trust.put(e, 0f));
return trust;

sometimes produces the errors. But

// myIds is an ArrayList<Long>
myIds.stream().forEach(e -> trust.put(e, 0f));
return trust;

does not produce the error.

If you want to insert concurrently then you must use a synchronized wrapper. So your code should be:

// myIds is an ArrayList<Long>
Map<Long, Float> syncTrust = Collections.synchronizedSortedMap(trust);
myIds.parallelStream().forEach(e -> syncTrust.put(e, 0f));
return trust;
Sign up to request clarification or add additional context in comments.

1 Comment

Yes... for people interested to other small information I also suggest to read the comments above the question...
2

The issue is with the progressive resizing of the underlying not synchronized collection, I guess in your case your Map structure is not synchronized. Due to the gradual re-size, not handled correctly in a multi threads context, it can bring to have a thread trying to insert an element out of the range of the current size of the vector.

From the book "Oracle Certified Professional Java SE 8 Programmer II", Sybex:

For an ArrayList object, the JVM internally manages a primitive array of the same type. As the size of the dynamic ArrayList grows, a new, larger primitive array is periodically required. If two threads both trigger the array to be resized at the same time, a result can be lost, producing the unexpected value shown here. As briefly mentioned earlier, and also discussed later in this chapter, the unexpected result of two tasks executing at the same time is a race condition.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.