3

Suppose we have some function which takes in a key, retreives its value from a shared hashtable, and perform some operations on it to obtain a new value , and then updates the hashtable with this new value. Now, this function is multi-threaded so there could be multiple threads calling this same function, either with the same or different keys, so some sort of race condition protection using mutex is necessary. I've come up with the following implementation in python using locks, where I use a dict as a hashtable. (I know that in python, dict operations are atomic but this is just for illustration purposes for the sake of the algorithm)

class Solution:
    
    def __init__(self):
        self.datamap = {}
        self.lockmap = {}
        self.datamap_lock = Lock()
        self.lockmap_lock = Lock()

    def initializeKey(self, key):
        with self.datamap_lock:
            if key not in self.datamap:
                self.datamap[key] = (-sys.maxsize, 0)
                with self.lockmap_lock:
                    self.lockmap[key] = Lock()

    def getLock(self, key):
        with self.lockmap_lock:
            return self.lockmap[key]

    def getValue(self, key):
        with self.datamap_lock:
            return self.datamap[key]

    def storeValue(self, key, max_, value):
        with self.datamap_lock:
            self.datamap[key] = (max_, value)

    def calc(self, key, param_value):
        self.initializeKey(key)
        with self.getLock(key):
            max_, value = self.getValue(key)
            # Does some operations on value to obtain a new value
            self.storeValue(key, max_, value)

Basically I used a mutex for the hashtable, a mutex for each key and a mutex for the hashtable which maps from the key to the mutex. Would this implementation be correct and thread-safe? Is there a way to do it better without using so many locks/mutexes?

2
  • I think there are some flaws n your code: places where a race condition could take place. But I guess it could be done with a single lock equivalent to your "datamap_lock": use it to protect both "data_map" and "lock_map" access - yu gain nothing by separating access there - and a lock-pool, with the aproximate size of the number or threads (instead of one lock per key): one lock would be used from the pool in the lockmap only when updating one value, and returned to the pool afterwards. Sorry, I am not in a setting I can write this as a proper answer now. Commented Oct 13, 2023 at 22:58
  • My initial intention to split into 2 locks is so that when the lock for one of the maps(E.g. datamap) gets acquired, it does not block other threads from accessing the other map(E.g. lockmap), which could have some performance impact. Commented Oct 15, 2023 at 12:32

2 Answers 2

3

It can be a lot simpler than what you are trying there: A single lock can control the lock_map access, and any value in there, even a boolean, can work as a "soft lock" from that point on:

from contextlib import contextmanager
from time import sleep, monotonic
from threading import RLock

TIMESTEP = 0.0001 # 1/10th of ms between locking retries: should be a nice magnitude
# Factor out all locking logic to a single function:

class Solution:
    
    def __init__(self):
        self.datamap = {}
        self.lockmap = {}
        self.lock = RLock()
        
    @contextmanager
    def get_lock(self, key, timeout=None):
        # No other method should touch "self.lockmap"
        
        retry = True
        
        start = monotonic()
        try:
            while retry:
                retry = False
                with self.lock: # use the local lock to modify the lockmap
                    if not self.lockmap.get(key, False):
                        self.lockmap[key] = True
                    else:
                        retry = True
                        time.sleep(TIMESTEP)
                        if timeout is not None:
                            if monotonic() - start > timeout:
                                raise TimeoutError()
            yield 
        finally:
            with self.lock:
                try:
                    del self.lockmap[ley]
                except KeyError:
                    pass 
        

    def initializeKey(self, key):
        with self.get_lock(key):
            if key not in self.datamap:
                self.datamap[key] = (-sys.maxsize, 0)
    def getValue(self, key):
        # with self.get_lock(key):   # this is not needed in Python - read-only from the datamap dict is atomic
            return self.datamap[key]

    def storeValue(self, key, max_, value):
        with self.get_lock(key):  # This will prevent a value from bein written while a "calc" is being performed
            self.datamap[key] = (max_, value)

    def calc(self, key, param_value):
        self.initializeKey(key)
        with self.get_lock(key):
            max_, value = self.getValue(key)
            # Does some operations on value to obtain a new value
            
            # with the  "soft_lock" mechanism in get_lock, writtings to datamapa[key] are
            # protected, and other threads are free to access and mutate other keys!
            ...
            
            # not really needed call a method to store the value. If you do this, "get_lock" would have
            # to implement the logic of a re-entrant lock (threading.RLock() ) 
            ## self.storeValue(key, max_, value)
            # if you modify datamap directly here, it is protected by our "soft_lock"
            self.datamap[key] = (max_, value)

Just clarifying the role of the yield statement in get_lock: the @contextmanager decorator makes it so that this method pauses at this point, and the with block in the user code is executed. Execution resumes after the with block on the next line after the yield, if everything went ok, or inside the "finally" regardless of an exception that might have taken place on the target code.

Sign up to request clarification or add additional context in comments.

Comments

1
    def calc(self, key, param_value):
        self.datamap_lock.acquire()

No, please don't do that.

Prefer to use a context manager:

        with self.datamap_lock:
            ...

Then you won't neglect to release the lock. Even if an unexpected exception happens.

Oh, wait, I see you may release part way through. Please don't do that, as it makes it harder to reason about the code's behavior. It's a bit like preferring structured while loops over spaghetti goto.

I guess it's "fine" that you double acquire a reentrant Lock. But it's unclear why that's desirable. Stick to with blocks.

        # Does some operations on value to obtain a new value and put it back into the hash table

Ok, good.

But it's not obvious what lock(s) we hold at this point. Break out that code fragment as a helper, and make it clear that a certain lock is held for exactly the duration of that helper's execution.


Your central difficulty seems to be enforcing this invariant:

Whenever a [key] entry is visible to other threads, a corresponding Lock object shall also exist and shall be unlocked.

Recommend you allocate that responsibility to the "add a key to the dict" code, rather than putting it in calc(). Then it becomes a simple matter to grab that (already existing) Lock whenever we perform a calculation on some key.

1 Comment

Thanks I've edited my code based on your inputs

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.