1

The goal is to check if two intervals intersect one another. Easy... right? Just plug this guy in and we can move on?

intersects = itemStart < queryEnd && queryStart < itemEnd

Not so fast, we have the following constraints:

  • We cannot use the candidate item's interval directly during the intersection check. We must pre-process the interval [itemStart, itemEnd] => encodedItem. The encoded item interval must be either a number or a string. We can however use the [queryStart, queryEnd] separately or not during the check.
  • When performing the intersection check we are limited to the following operations: beginsWith, between. No further boolean operators can be applied. (These are limitations of dynamodb).
    • beginsWith(a, b) checks whether string a starts with string b.
    • between(a, b, c) checks whether a is between b and c. Works with strings and numbers.
  • The start and end values of the items are in the range [0, 3254778254386].

In summary, the final intersection check must take the form of one of these two:

intersects = between(encodedItem, encodedQueryStart, encodedQueryEnd)
intersects = beginsWith(encodedItem, encodedQuery)

Perhaps there is a mathematical proof that flattening the 2 dimensional interval into 1 dimension prevents us from inspecting whether the intervals intersect.


The actual goal

While I am interested in random string problems as a pastime, there is a real world problem underlying this one. I am trying to perform date range intersection queries on a key value database, in this case dynamodb


What I've tried so far

Z-Ordering

I have been looking into geospatial indexes and Z-Ordering which has given me some faith that this is possible. [0][1][2][3] With Z-ordering you can work out if a query interval contains another by interleaving the interval into a string.

e.g.

            itemA = { start: 123, end: 456 }
     encodedItemA = "142536"
            itemB = { start: 111, end: 115 }
     encodedItemB = "111115"
            query = { start: 110, end: 119 }
encodedQueryStart = "111100" // interleave query.start with itself
  encodedQueryEnd = "111199" // interleave query.end with itself

Once this preprocessing is done I can do

isContained = between(encodedItem, encodedQueryStart, encodedQueryEnd)
isContained = between("142536", "111100", "111199") // returns false ✅
isContained = between("111115", "111100", "111199") // returns true ✅

However, "isContained" is not the same as "intersects".

Redundancy

The other thing I've explored is expanding every item so that it is no longer an interval but represented by many points facilitating prefix search. e.g.

itemA = { start: 123, end: 456 } is inserted as 334 rows:
           itemA1 = 123
           itemA2 = 124
              ...
           itemAN = 456
itemB = { start: 111, end: 115 } is inserted as 5 rows:
           itemB1 = 111
           itemB2 = 112
              ...
           itemBN = 115
query = { start: 110, end: 119 }
encodedQuery = commonPrefix(query.start, query.end) = "11"
intersects = beginsWith("111", "11") // returns false ✅
intersects = beginsWith("123", "11") // returns true ✅

This comes with its own trade offs.


Is there any other clever wizardry I could try to make this work? 🪄

2
  • Out of interest, what solution did you settle on? Commented Jun 15, 2022 at 14:49
  • 1
    In the end we skipped this problem and decided it was easier for the client to download all the relevant data and do the query on the client Commented Jun 15, 2022 at 15:53

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.