0

A bit of (simplified) context.

Let's say I have an ArrayList<ContentStub> where ContentStub is:

public class ContentStub {
    ContentType contentType;
    Object content;
}

And I have multiple implementations of classes that "inflate" stubs for each ContentType, e.g.

public class TypeAStubInflater {

    public void inflate(List<ContentStub> contentStubs) {
        contentStubs.forEach(stub ->
                             {
                                 if(stub.contentType == ContentType.TYPE_A) {
                                    stub.content = someService.getContent();
                                 }
                             });         
    }
}

The idea being, there is TypeAStubInflater which only modifies items ContentType.TYPE_A running in one thread, and TypeBStubInflater which only modifies items ContentType.TYPE_B, etc. - but each instance's inflate() method is modifying items in the same contentStubs List, in parallel.

However:

  • No thread ever changes the size of the ArrayList
  • No thread ever attempts to modify a value that's being modified by another thread
  • No thread ever attempts to read a value written by another thread

Given all this, it seems that no additional measures to ensure thread-safety are necessary. From a (very) quick look at the ArrayList implementation, it seems that there is no risk of a ConcurrentModificationException - however, that doesn't mean that something else can't go wrong. Am I missing something, or this safe to do?

7
  • 1
    ConcurrentModificationException is thrown when you are modifying state of a list (like by adding or removing elements which can affect its size etc.) but in your code you modify state of elements placed in list, so that has nothing to do with list itself. Commented Aug 20, 2020 at 9:09
  • That is my feeling as well - I wonder if there is something else bad about doing what I propose though. Commented Aug 20, 2020 at 9:15
  • Is your question answered? Commented Aug 20, 2020 at 23:06
  • Not really, it repeats what I already said I believe in the body of the question (ConcurrentModificationException not being a problem) .I was hoping for a more authoritative answer (i.e. with links to documentation/source), but I realize that proving something is not a problem probably an impossible task Commented Aug 21, 2020 at 8:18
  • 1
    There’s the fundamental point of Java, that these objects are not “in the … List”, but the list has references to these objects. There can be an arbitrary number of other references to these objects. That all doesn’t matter. The variable, you’re modifying, is stub.content of distinct objects. So there’s no problem with the writes, however, writing values that no-one ever reads would be pointless. There must be reads. And there must be a reason why these objects are in a list (i.e. there is code iterating over it). But if these things do not interact, they shouldn’t be in the same object. Commented Aug 25, 2020 at 16:05

2 Answers 2

1

In general, that will work, because you are not modifying the state of the List itself, which would throw a ConcurrentModificationException if any iterator is active at the time of looping, but rather are modifying just an object inside the list, which is fine from the list's POV.

I would recommend splitting up your into a Map<ContentType, List<ContentStub>> and then start Threads with those specific lists.

You could convert your list to a map with this:

Map<ContentType, ContentStub> typeToStubMap = stubs.stream().collect(Collectors.toMap(stub -> stub.contentType, Function.identity()));

If your List is not that big (<1000 entries) I would even recommend not using any threading, but just use a plain for-i loop to iterate, even .foreach if that 2 extra integers are no concern.

Sign up to request clarification or add additional context in comments.

5 Comments

ThreadLocal might interest you.
Appreciate the advice - the reasons for the implementation being how it is - (1) The service calls are network I/O and the performance speedup from parallelization here is dramatic and (2) The order of the list is vital, and there is enough complex business logic in this code to make additional data structure manipulation highly undesirable.
Not sure how ThreadLocal would help here - I want all the threads to populate a shared data structure, not to have their own copies of it.
@kenny_k You've said that "No thread ever attempts to read a value written by another thread" and "No thread ever attempts to modify a value that's being modified by another thread". So there is nothing shared. Could you clarify that?
Sounds very interesting. Sadly, I won't be able to look into it myself :)
1

Let's assume the thread A writes TYPE_A content and thread B writes TYPE_B content. The List contentStubs is only used to obtain instances of ContentStub: read-access only. So from the perspective of A, B and contentStubs, there is no problem. However, the updates done by threads A and B will likely never be seen by another thread, e.g. another thread C will likely conclude that stub.content == null for all elements in the list.

The reason for this is the Java Memory Model. If you don't use constructs like locks, synchronization, volatile and atomic variables, the memory model gives no guarantee if and when modifications of an object by one thread are visible for another thread. To make this a little more practical, let's have an example.

Imagine that a thread A executes the following code:

    stub.content = someService.getContent(); // happens to be element[17]

List element 17 is a reference to a ContentStub object on the global heap. The VM is allowed to make a private thread copy of that object. All subsequent access to reference in thread A, uses the copy. The VM is free to decide when and if to update the original object on the global heap.

Now imagine a thread C that executes the following code:

    ContentStub stub = contentStubs.get(17);

The VM will likely do the same trick with a private copy in thread C.

If thread C already accessed the object before thread A updated it, thread C will likely use the – not updated – copy and ignore the global original for a long time. But even if thread C accesses the object for the first time after thread A updated it, there is no guarantee that the changes in the private copy of thread A already ended up in the global heap.

In short: without a lock or synchronization, thread C will almost certainly only read null values in each stub.content.

The reason for this memory model is performance. On modern hardware, there is a trade-off between performance and consistency across all CPUs/cores. If the memory model of a modern language requires consistency, that is very hard to guarantee on all hardware and it will likely impact performance too much. Modern languages therefore embrace low consistency and offer the developer explicit constructs to enforce it when needed. In combination with instruction reordering by both compilers and processors, that makes old-fashioned linear reasoning about your program code … interesting.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.