How would you implement a thread-safe function which reads from a shared hashtable using a key and updates the value in a multi-threaded environment?

Question

Suppose we have some function which takes in a key, retreives its value from a shared hashtable, and perform some operations on it to obtain a new value , and then updates the hashtable with this new value. Now, this function is multi-threaded so there could be multiple threads calling this same function, either with the same or different keys, so some sort of race condition protection using mutex is necessary. I've come up with the following implementation in python using locks, where I use a dict as a hashtable. (I know that in python, dict operations are atomic but this is just for illustration purposes for the sake of the algorithm)

class Solution:
    
    def __init__(self):
        self.datamap = {}
        self.lockmap = {}
        self.datamap_lock = Lock()
        self.lockmap_lock = Lock()

    def initializeKey(self, key):
        with self.datamap_lock:
            if key not in self.datamap:
                self.datamap[key] = (-sys.maxsize, 0)
                with self.lockmap_lock:
                    self.lockmap[key] = Lock()

    def getLock(self, key):
        with self.lockmap_lock:
            return self.lockmap[key]

    def getValue(self, key):
        with self.datamap_lock:
            return self.datamap[key]

    def storeValue(self, key, max_, value):
        with self.datamap_lock:
            self.datamap[key] = (max_, value)

    def calc(self, key, param_value):
        self.initializeKey(key)
        with self.getLock(key):
            max_, value = self.getValue(key)
            # Does some operations on value to obtain a new value
            self.storeValue(key, max_, value)

Basically I used a mutex for the hashtable, a mutex for each key and a mutex for the hashtable which maps from the key to the mutex. Would this implementation be correct and thread-safe? Is there a way to do it better without using so many locks/mutexes?

I think there are some flaws n your code: places where a race condition could take place. But I guess it could be done with a single lock equivalent to your "datamap_lock": use it to protect both "data_map" and "lock_map" access - yu gain nothing by separating access there - and a lock-pool, with the aproximate size of the number or threads (instead of one lock per key): one lock would be used from the pool in the lockmap only when updating one value, and returned to the pool afterwards. Sorry, I am not in a setting I can write this as a proper answer now. — jsbueno
– jsbueno, Commented Oct 13, 2023 at 22:58
My initial intention to split into 2 locks is so that when the lock for one of the maps(E.g. datamap) gets acquired, it does not block other threads from accessing the other map(E.g. lockmap), which could have some performance impact. — Lew Wei Hao
– Lew Wei Hao, Commented Oct 15, 2023 at 12:32

jsbueno · Accepted Answer · 2023-10-15 16:13:14Z

It can be a lot simpler than what you are trying there: A single lock can control the lock_map access, and any value in there, even a boolean, can work as a "soft lock" from that point on:

from contextlib import contextmanager
from time import sleep, monotonic
from threading import RLock

TIMESTEP = 0.0001 # 1/10th of ms between locking retries: should be a nice magnitude
# Factor out all locking logic to a single function:

class Solution:
    
    def __init__(self):
        self.datamap = {}
        self.lockmap = {}
        self.lock = RLock()
        
    @contextmanager
    def get_lock(self, key, timeout=None):
        # No other method should touch "self.lockmap"
        
        retry = True
        
        start = monotonic()
        try:
            while retry:
                retry = False
                with self.lock: # use the local lock to modify the lockmap
                    if not self.lockmap.get(key, False):
                        self.lockmap[key] = True
                    else:
                        retry = True
                        time.sleep(TIMESTEP)
                        if timeout is not None:
                            if monotonic() - start > timeout:
                                raise TimeoutError()
            yield 
        finally:
            with self.lock:
                try:
                    del self.lockmap[ley]
                except KeyError:
                    pass 
        

    def initializeKey(self, key):
        with self.get_lock(key):
            if key not in self.datamap:
                self.datamap[key] = (-sys.maxsize, 0)
    def getValue(self, key):
        # with self.get_lock(key):   # this is not needed in Python - read-only from the datamap dict is atomic
            return self.datamap[key]

    def storeValue(self, key, max_, value):
        with self.get_lock(key):  # This will prevent a value from bein written while a "calc" is being performed
            self.datamap[key] = (max_, value)

    def calc(self, key, param_value):
        self.initializeKey(key)
        with self.get_lock(key):
            max_, value = self.getValue(key)
            # Does some operations on value to obtain a new value
            
            # with the  "soft_lock" mechanism in get_lock, writtings to datamapa[key] are
            # protected, and other threads are free to access and mutate other keys!
            ...
            
            # not really needed call a method to store the value. If you do this, "get_lock" would have
            # to implement the logic of a re-entrant lock (threading.RLock() ) 
            ## self.storeValue(key, max_, value)
            # if you modify datamap directly here, it is protected by our "soft_lock"
            self.datamap[key] = (max_, value)

Just clarifying the role of the yield statement in get_lock: the @contextmanager decorator makes it so that this method pauses at this point, and the with block in the user code is executed. Execution resumes after the with block on the next line after the yield, if everything went ok, or inside the "finally" regardless of an exception that might have taken place on the target code.

J_H · Accepted Answer · 2023-10-13 23:41:43Z

1

    def calc(self, key, param_value):
        self.datamap_lock.acquire()

No, please don't do that.

Prefer to use a context manager:

        with self.datamap_lock:
            ...

Then you won't neglect to release the lock. Even if an unexpected exception happens.

Oh, wait, I see you may release part way through. Please don't do that, as it makes it harder to reason about the code's behavior. It's a bit like preferring structured while loops over spaghetti goto.

I guess it's "fine" that you double acquire a reentrant Lock. But it's unclear why that's desirable. Stick to with blocks.

        # Does some operations on value to obtain a new value and put it back into the hash table

Ok, good.

But it's not obvious what lock(s) we hold at this point. Break out that code fragment as a helper, and make it clear that a certain lock is held for exactly the duration of that helper's execution.

Your central difficulty seems to be enforcing this invariant:

Whenever a [key] entry is visible to other threads, a corresponding Lock object shall also exist and shall be unlocked.

Recommend you allocate that responsibility to the "add a key to the dict" code, rather than putting it in calc(). Then it becomes a simple matter to grab that (already existing) Lock whenever we perform a calculation on some key.

edited Oct 13, 2023 at 23:41

answered Oct 13, 2023 at 23:27

J_H

21.2k5 gold badges29 silver badges51 bronze badges

1 Comment

Lew Wei Hao Over a year ago

Thanks I've edited my code based on your inputs

Collectives™ on Stack Overflow

How would you implement a thread-safe function which reads from a shared hashtable using a key and updates the value in a multi-threaded environment?

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related