2

I have a directory of images in order. Typically my code will be using data from a sequential subset of images (e.g. images 5-10), and the naive options for accessing these are:

  1. Create a wrapper object with a method that loads the image when needed and reads my data (e.g. a pixel value). This has little memory overhead but will be slow as it will need to load each image every time.

  2. Store all the images in memory. This will be fast but obviously there's a limit to how many images we can store.

I would like to find:

  • Some method by which I can define how to read the image corresponding to an index or a path, and then allows me to access, say magic_image_collection[index] without me having to worry about whether it's going to return the object in memory or read it afresh. This would ideally keep the appropriate images or the n most recently accessed images in memory.

2 Answers 2

6

You can extend the default dict and use __missing__ method to call a loading function if the key is missing:

class ImageDict(dict):
    def __missing__(self, key):
        self[key] = img = self.load(key)
        return img
    def load(self, key):
        # create a queue if not exist (could be moved to __init__)
        if not hasattr(self, '_queue'):
            self._queue = []
        # pop the oldest entry in the list and the dict
        if len(self._queue) >= 100:
            self.pop(self._queue.pop(0))
        # append this key as a newest entry in the queue
        self._queue.append(key)
        # implement image loading here and return the image instance
        print 'loading', key
        return 'Image for %s' % key

And the output (the loading happen only when the key doesn't exist yet.)

>>> d = ImageDict()
>>> d[3]
loading 3
'Image for 3'
>>> d[3]
'Image for 3'
>>> d['bleh']
loading bleh
'Image for bleh'
>>> d['bleh']
'Image for bleh'

One evolution would be to store only the N last element in the dict, and purge the oldest entries. You can implement it by keeping a list of keys for ordering.

Sign up to request clarification or add additional context in comments.

6 Comments

Get rid of __getitem__ and rename load to __missing__ and you should be okay.
All you have to do in __missing__ is to either return the appropriate value for the key, or raise an exception. The dict code that called __missing__ will take care of updating the dict (which your class inherited from). To add support for 'last n elements', add a list of keys as a member, and add the key to the end of the list in __missing__. When the list exceeds n, pop the oldest (0'th) key from the list and from self.
Thanks Paul, didn't know __missing__, very nice !
@tito - if I read the docs correctly, you don't even have to assign to self[key] in __missing__, just return it (or raise an exception if the key is invalid somehow). The doc code will take care of the rest.
Paul's right there. I'll mark this as correct. @tito mind if I tidy it up a bit though and add the 'last n elements' functionality as Paul suggested?
|
2

Weakrefs aren't what you want -- weakrefs are a way to reference an item that allows the garbage collector to collect (i.e. destroy) the referent if only weakrefs to it exist. In other words, if you create and store only weakrefs to some object, it is likely to be garbage collected quickly, and you won't have benefitted from it.

I'd go with your option #1 above. On modern operating systems, the OS maintains an in-memory cache of recently accessed files (or parts thereof), which will mean that you'll have to bear the cost of loading the file from disk once, but after that, subsequent accesses to the file will be as fast (or nearly so) as if it were in memory in your application. The FS cache is usually a LRU-style cache, so frequently-accessed items will tend to stay in memory, while infrequently accessed items will tend to be evicted (and will subsequently be loaded from disk if needed). In most cases, it is sufficient to rely on the operating system's implementation of this sort of logic, rather than writing your own (especially since you don't have to write and maintain code to do it!)

1 Comment

Thanks for the clarification with weakrefs. I'll try both option #1 and @tito's idea.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.