2

I'm looking for a way to cache data using a map (or equivalent) but instead of a comparable key (like int, string, etc) it would be lower and upper time boundaries. I would then be able to perform a map lookup of a specific timestamp.

In SQL terms this is what I would want (note the exclusive upper bound). This can be GIST-indexed and it essentially becomes like a hash.

SELECT col_x FROM table_y WHERE NOW() <@ TSTZRANGE(start_date, end_date, '[)');

My absolutely basic solution right now consists of creating one map entry per minute-truncated timestamp, which works well but obviously generates a lot of highly redundant entries.

Alternatively I could use the lower boundary only as a key, but I'm not sure how to lookup the map for the "first match greater or equal to the key" (that's obviously contrary to the comparable requirement). Is this where slices would be more appropriate?

By the way I'm using int64 timestamps instead of time.Time.

4
  • 1) Could you provide some sample Go code and data showing how you'd like this to work? 2) "By the way I'm using int64 timestamps instead of time.Time." To be sure I understand, this is just about having a map with a key that's two numbers? 3) I like swords. Commented Feb 5 at 21:27
  • "I would then be able to perform a map lookup of a specific timestamp." What is your plan for overlapping time ranges? You might be better off putting your data into SQLite, indexing it, and querying it. Commented Feb 5 at 21:31
  • @Schwern in my case, the date ranges are guaranteed to be exclusive since there is a constraint in postgres. I'm actually trying to cache a static list in golang instead of performing database lookups. Commented Feb 5 at 21:46
  • Before writing your own indexing scheme, try a SQLite temporary database (the data will reside in memory) and see if it performs well enough. Commented Feb 5 at 21:51

2 Answers 2

4

By the way I'm using int64 timestamps instead of time.Time.

This isn't actually about time at all, we can drop that. You want to look up integers within a given range.

Rather than trying to make up your own indexing solution, probably your best solution is to put your data into a SQLite temporary database, index, and query it.

create table data (
  num biginteger not null,
  -- add your data columns
);
create index data_num_idx on data(num);

Or use redis, often used as a cache, and a sorted set. zrange has O(logn + m) time complexity, N being the number of elements in the sorted set and M the number of elements returned, equivalent to using a b-tree because redis sorted sets are b-trees (more about b-trees below).

This can be coded up quickly, and are very flexible and easier to maintain than a custom solution. Decide on your performance criteria and run some benchmarks while you're coding up a Go solution.


This can be GIST-indexed and it essentially becomes like a hash.

Sure, you could use a GiST like a hash, but that won't help optimize your range queries much. If you use microsecond time as a hash key you'd need to search 1,000,000 entries to cover a range of one second.

A GiST is like a search tree, in fact it is a search tree. GiST is a Generalized Search Tree based on B+Trees.

If you want to do this yourself, you need a data structure like a map with its quick lookups and inserts, but also like a list with its fast in-order traversal: a tree! Not just any tree, but a B-Tree just like SQL indexes. Store the timestamp as the key. Search for the node which is equal to the lower bound or until you fail to match, this is O(logn), then walk the tree until you hit a node greater than the upper bound, this is O(m). The total is O(logn + m) where n is the number of nodes in the tree and m is the number of nodes in the range.

There are several Go B-Tree implementations out there.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the answer! I have considered redis and sqlite as caching engines. I was hoping to find a "native" solution, either as an 3rd party module or maybe even built-in functionality (I forgot to mention I'm using 1.23)
1

Look, i can suggest you a practical solution using an interval tree implementation. In my opinion, this is much more efficient than creating entries per minute, here's the code:

type TimeRange struct {
    Start int64
    End   int64
}

type IntervalNode struct {
    Range     TimeRange
    Value     interface{}
    Max       int64
    Left      *IntervalNode
    Right     *IntervalNode
}

type IntervalMap struct {
    root *IntervalNode
}

func NewIntervalMap() *IntervalMap {
    return &IntervalMap{}
}

func (im *IntervalMap) Insert(start, end int64, value interface{}) {
    if im.root == nil {
        im.root = &IntervalNode{
            Range: TimeRange{Start: start, End: end},
            Value: value,
            Max:   end,
        }
        return
    }
    im.root = im.insert(im.root, TimeRange{Start: start, End: end}, value)
}

func (im *IntervalMap) insert(node *IntervalNode, r TimeRange, value interface{}) *IntervalNode {
    if node == nil {
        return &IntervalNode{Range: r, Value: value, Max: r.End}
    }

    if r.Start < node.Range.Start {
        node.Left = im.insert(node.Left, r, value)
    } else {
        node.Right = im.insert(node.Right, r, value)
    }

    if node.Max < r.End {
        node.Max = r.End
    }
    return node
}

func (im *IntervalMap) Find(timestamp int64) interface{} {
    if im.root == nil {
        return nil
    }
    return im.search(im.root, timestamp)
}

func (im *IntervalMap) search(node *IntervalNode, timestamp int64) interface{} {
    if node == nil {
        return nil
    }

    // Check if current node contains the timestamp
    if timestamp >= node.Range.Start && timestamp < node.Range.End {
        return node.Value
    }

    // Search left subtree if it could contain our timestamp
    if node.Left != nil && timestamp < node.Range.Start {
        return im.search(node.Left, timestamp)
    }

    // Search right subtree
    return im.search(node.Right, timestamp)
}

How we can use it?

timeMap := NewIntervalMap()

// Insert some ranges
timeMap.Insert(
    time.Date(2024, 1, 1, 0, 0, 0, 0, time.UTC).Unix(),
    time.Date(2024, 1, 2, 0, 0, 0, 0, time.UTC).Unix(),
    "value1",
)

// Look up a timestamp
now := time.Now().Unix()
value := timeMap.Find(now)

It is beneficial because you'll get O(log n) lookup time, no redundant entries, it handles overlapping ranges correctly and it's memory efficient

2 Comments

absolutely love this answer. Let me implement it and I'll get back to you. Furthermore, my lookups will 90% of the time be scanning the "latest" period, with the odd request that'll go back in time a week or two. Meaning the "current node" shouldn't move around all that much.
This works perfectly. Thanks for opening my eyes to 'interval search trees'!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.