1

I'm trying to reduce the memory usage for the lock objects of segmented data. See my questions here and here. Or just assume you have a byte array and every 16 bytes can (de)serialize into an object. Let us call this a "row" with row length of 16 bytes. Now if you modify such a row from a writer thread and read from multiple threads you need locking. And if you have a byte array size of 1MB (1024*1024) this means 65536 rows and the same number of locks.

This is a bit too much, also that I need much larger byte arrays, and I would like to reduce it to something roughly proportional to the number of threads. My idea was to create a

ConcurrentHashMap<Integer, LockHelper> concurrentMap;

where Integer is the row index and before a thread 'enters' a row it puts a lock object in this map (got this idea from this answer). But no matter what I think through I cannot find an approach that is really thread-safe:

// somewhere else where we need to write or read the row
LockHelper lock1 = new LockHelper();
LockHelper lock = concurrentMap.putIfAbsent(rowIndex, lock1);
lock.addWaitingThread(); // is too late
synchronized(lock) {
  try { 
      // read or write row at rowIndex e.g. writing like
      bytes[rowIndex/16] = 1;
      bytes[rowIndex/16 + 1] = 2;
      // ...
  } finally {
     if(lock.noThreadsWaiting())
        concurrentMap.remove(rowIndex);
  }
}

Do you see a possibility to make this thread-safe?

I have the feeling that this will look very similar like the concurrentMap.compute contstruct (e.g. see this answer) or could I even utilize this method?

map.compute(rowIndex, (key, value) -> {
    if(value == null)
       value = new Object();
    synchronized (value) {
        // access row
        return value;
    }
});
map.remove(rowIndex);

Is the value and the 'synchronized' necessary at all as we already know the compute operation is atomically?

// null is forbidden so use the key also as the value to avoid creating additional objects
ConcurrentHashMap<Integer, Integer> map = ...;

// now the row access looks really simple:
map.compute(rowIndex, (key, value) -> {
    // access row
    return key;
});
map.remove(rowIndex);

BTW: Since when we have this compute in Java. Since 1.8? Cannot find this in the JavaDocs

Update: I found a very similar question here with userIds instead rowIndices, note that the question contains an example with several problems like missing final, calling lock inside the try-finally-clause and lack of shrinking the map. Also there seems to be a library JKeyLockManager for this purpose but I don't think it is thread-safe.

Update 2: The solution seem to be really simple as Nicolas Filotto pointed out how to avoid the removal:

map.compute(rowIndex, (key, value) -> {
    // access row
    return null;
});

So this is really less memory intense BUT the simple segment locking with synchronized is at least 50% faster in my scenario.

6
  • Maybe you could use an arrangement similar to n-way-associative caches? Two-way-associative would mean to have one lock for the even-address rows and one for the odd-address rows. Depending on the number of threads you could experiment with the number of locks. If there is a pattern in the accesses it may help to use a prime number of locks or even a primitive hash to map row addresses to the matching lock object. Commented Sep 24, 2016 at 10:05
  • Striping assumes a good distribution, but suffers from contention and the locks thrash due to performing tiny amounts of work. Instead consider a message passing approach to have a single writer (or segment for one per core). You can then use a disruptor or flat-combining approach, penalizing a call by performing a batch of work instead of suffering from contention. Commented Sep 24, 2016 at 21:19
  • @GeorgBisseling that was what we had already tested, see the update for the sources. Commented Sep 27, 2016 at 10:10
  • @BenManes Do you have some links explaining your approach in more detail? In my case I can assume that I have just one writer so this would be interesting. Commented Sep 27, 2016 at 10:12
  • 1
    @Karussell The techniques work best if you don't need the requesting thread to wait for the response. See disruptor, flat combining, and caffeine for different variations on the idea of record => batch => lock => replay. Commented Sep 27, 2016 at 19:01

1 Answer 1

1

Is the value and the synchronized necessary at all as we already know the compute operation is atomically?

I confirm that it is not needed to add a synchronized block in this case as the compute method is done atomically as stated in the Javadoc of ConcurrentHashMap#compute(K key, BiFunction<? super K,? super V,? extends V> remappingFunction) that has been added with BiFunction since Java 8, I quote:

Attempts to compute a mapping for the specified key and its current mapped value (or null if there is no current mapping). The entire method invocation is performed atomically. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this Map.

What you try to achieve with the compute method could be totally atomic if you make your BiFunction always returns null to remove the key atomically too such that everything will be done atomically.

map.compute(
    rowIndex, 
    (key, value) -> {
        // access row here
        return null;
    }
);

This way you will then fully rely on the locking mechanism of a ConcurrentHashMap to synchronize your accesses to your rows.

Sign up to request clarification or add additional context in comments.

8 Comments

Ah, so null can be used to avoid storing. Thanks for pointing this out! Regarding the improvement: the ConcurrentHashMap just 'stores' locks proportionally to the number of threads and so the array is probably in the range of a few dozen entries, whereas my method needs one lock per row and will be several million (or a bit less if range-wise) but always a lot more than with this approach.
with a ConcurrentHashMap, you will have one lock per segment which covers a chunk of hash codes which also means a chunk of keys or more precisely a chunk of rows in your case you won't have one lock per row as you seem to expect. By default you will have 16 segments, if you need more segments you will need to construct your ConcurrentHashMap with a higher concurrencyLevel
Ah, understood and using the concurrencyLevel is exactly what I need and could set it to the number of threads. Plus far more memory efficient than my solution.
Ah it is more memory efficient interesting I would have bet for the opposite, anyway good news
I guess that it is because you pay the price of the overhead added by the fact that you don't execute your code directly but through a lambda expression. But still for me your OL is the best for sure, usually it is done with modulus on the key get the right lock but the idea is the same
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.