Concurrent byte array access in Java with as few locks as possible

Question

I'm trying to reduce the memory usage for the lock objects of segmented data. See my questions here and here. Or just assume you have a byte array and every 16 bytes can (de)serialize into an object. Let us call this a "row" with row length of 16 bytes. Now if you modify such a row from a writer thread and read from multiple threads you need locking. And if you have a byte array size of 1MB (1024*1024) this means 65536 rows and the same number of locks.

This is a bit too much, also that I need much larger byte arrays, and I would like to reduce it to something roughly proportional to the number of threads. My idea was to create a

ConcurrentHashMap<Integer, LockHelper> concurrentMap;

where Integer is the row index and before a thread 'enters' a row it puts a lock object in this map (got this idea from this answer). But no matter what I think through I cannot find an approach that is really thread-safe:

// somewhere else where we need to write or read the row
LockHelper lock1 = new LockHelper();
LockHelper lock = concurrentMap.putIfAbsent(rowIndex, lock1);
lock.addWaitingThread(); // is too late
synchronized(lock) {
  try { 
      // read or write row at rowIndex e.g. writing like
      bytes[rowIndex/16] = 1;
      bytes[rowIndex/16 + 1] = 2;
      // ...
  } finally {
     if(lock.noThreadsWaiting())
        concurrentMap.remove(rowIndex);
  }
}

Do you see a possibility to make this thread-safe?

I have the feeling that this will look very similar like the concurrentMap.compute contstruct (e.g. see this answer) or could I even utilize this method?

map.compute(rowIndex, (key, value) -> {
    if(value == null)
       value = new Object();
    synchronized (value) {
        // access row
        return value;
    }
});
map.remove(rowIndex);

Is the value and the 'synchronized' necessary at all as we already know the compute operation is atomically?

// null is forbidden so use the key also as the value to avoid creating additional objects
ConcurrentHashMap<Integer, Integer> map = ...;

// now the row access looks really simple:
map.compute(rowIndex, (key, value) -> {
    // access row
    return key;
});
map.remove(rowIndex);

BTW: Since when we have this compute in Java. Since 1.8? Cannot find this in the JavaDocs

Update: I found a very similar question here with userIds instead rowIndices, note that the question contains an example with several problems like missing final, calling lock inside the try-finally-clause and lack of shrinking the map. Also there seems to be a library JKeyLockManager for this purpose but I don't think it is thread-safe.

Update 2: The solution seem to be really simple as Nicolas Filotto pointed out how to avoid the removal:

map.compute(rowIndex, (key, value) -> {
    // access row
    return null;
});

So this is really less memory intense BUT the simple segment locking with synchronized is at least 50% faster in my scenario.

Maybe you could use an arrangement similar to n-way-associative caches? Two-way-associative would mean to have one lock for the even-address rows and one for the odd-address rows. Depending on the number of threads you could experiment with the number of locks. If there is a pattern in the accesses it may help to use a prime number of locks or even a primitive hash to map row addresses to the matching lock object. — Georg Bisseling
– Georg Bisseling, Commented Sep 24, 2016 at 10:05
Striping assumes a good distribution, but suffers from contention and the locks thrash due to performing tiny amounts of work. Instead consider a message passing approach to have a single writer (or segment for one per core). You can then use a disruptor or flat-combining approach, penalizing a call by performing a batch of work instead of suffering from contention. — Ben Manes
– Ben Manes, Commented Sep 24, 2016 at 21:19
@GeorgBisseling that was what we had already tested, see the update for the sources. — Karussell
– Karussell, Commented Sep 27, 2016 at 10:10
@BenManes Do you have some links explaining your approach in more detail? In my case I can assume that I have just one writer so this would be interesting. — Karussell
– Karussell, Commented Sep 27, 2016 at 10:12
@Karussell The techniques work best if you don't need the requesting thread to wait for the response. See disruptor, flat combining, and caffeine for different variations on the idea of record => batch => lock => replay. — Ben Manes
– Ben Manes, Commented Sep 27, 2016 at 19:01

Nicolas Filotto · Accepted Answer · 2016-09-24 16:36:23Z

1

Is the value and the synchronized necessary at all as we already know the compute operation is atomically?

I confirm that it is not needed to add a synchronized block in this case as the compute method is done atomically as stated in the Javadoc of ConcurrentHashMap#compute(K key, BiFunction<? super K,? super V,? extends V> remappingFunction) that has been added with BiFunction since Java 8, I quote:

Attempts to compute a mapping for the specified key and its current mapped value (or null if there is no current mapping). The entire method invocation is performed atomically. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this Map.

What you try to achieve with the compute method could be totally atomic if you make your BiFunction always returns null to remove the key atomically too such that everything will be done atomically.

map.compute(
    rowIndex, 
    (key, value) -> {
        // access row here
        return null;
    }
);

This way you will then fully rely on the locking mechanism of a ConcurrentHashMap to synchronize your accesses to your rows.

edited Sep 24, 2016 at 16:36

answered Sep 24, 2016 at 12:07

Nicolas Filotto

45.3k11 gold badges97 silver badges129 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Karussell Over a year ago

Ah, so null can be used to avoid storing. Thanks for pointing this out! Regarding the improvement: the ConcurrentHashMap just 'stores' locks proportionally to the number of threads and so the array is probably in the range of a few dozen entries, whereas my method needs one lock per row and will be several million (or a bit less if range-wise) but always a lot more than with this approach.

Nicolas Filotto Over a year ago

with a ConcurrentHashMap, you will have one lock per segment which covers a chunk of hash codes which also means a chunk of keys or more precisely a chunk of rows in your case you won't have one lock per row as you seem to expect. By default you will have 16 segments, if you need more segments you will need to construct your ConcurrentHashMap with a higher concurrencyLevel

Karussell Over a year ago

Ah, understood and using the concurrencyLevel is exactly what I need and could set it to the number of threads. Plus far more memory efficient than my solution.

Nicolas Filotto Over a year ago

Ah it is more memory efficient interesting I would have bet for the opposite, anyway good news

Nicolas Filotto Over a year ago

I guess that it is because you pay the price of the overhead added by the fact that you don't execute your code directly but through a lambda expression. But still for me your OL is the best for sure, usually it is done with modulus on the key get the right lock but the idea is the same

|

Collectives™ on Stack Overflow

Concurrent byte array access in Java with as few locks as possible

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related