The goal is to check if two intervals intersect one another. Easy... right? Just plug this guy in and we can move on?
intersects = itemStart < queryEnd && queryStart < itemEnd
Not so fast, we have the following constraints:
- We cannot use the candidate item's interval directly during the intersection check. We must pre-process the interval
[itemStart, itemEnd] => encodedItem. The encoded item interval must be either a number or a string. We can however use the[queryStart, queryEnd]separately or not during the check. - When performing the intersection check we are limited to the following operations:
beginsWith,between. No further boolean operators can be applied. (These are limitations of dynamodb).beginsWith(a, b)checks whether string a starts with string b.between(a, b, c)checks whether a is between b and c. Works with strings and numbers.
- The start and end values of the items are in the range
[0, 3254778254386].
In summary, the final intersection check must take the form of one of these two:
intersects = between(encodedItem, encodedQueryStart, encodedQueryEnd)
intersects = beginsWith(encodedItem, encodedQuery)
Perhaps there is a mathematical proof that flattening the 2 dimensional interval into 1 dimension prevents us from inspecting whether the intervals intersect.
The actual goal
While I am interested in random string problems as a pastime, there is a real world problem underlying this one. I am trying to perform date range intersection queries on a key value database, in this case dynamodb
What I've tried so far
Z-Ordering
I have been looking into geospatial indexes and Z-Ordering which has given me some faith that this is possible. [0][1][2][3] With Z-ordering you can work out if a query interval contains another by interleaving the interval into a string.
e.g.
itemA = { start: 123, end: 456 }
encodedItemA = "142536"
itemB = { start: 111, end: 115 }
encodedItemB = "111115"
query = { start: 110, end: 119 }
encodedQueryStart = "111100" // interleave query.start with itself
encodedQueryEnd = "111199" // interleave query.end with itself
Once this preprocessing is done I can do
isContained = between(encodedItem, encodedQueryStart, encodedQueryEnd)
isContained = between("142536", "111100", "111199") // returns false ✅
isContained = between("111115", "111100", "111199") // returns true ✅
However, "isContained" is not the same as "intersects".
Redundancy
The other thing I've explored is expanding every item so that it is no longer an interval but represented by many points facilitating prefix search. e.g.
itemA = { start: 123, end: 456 } is inserted as 334 rows:
itemA1 = 123
itemA2 = 124
...
itemAN = 456
itemB = { start: 111, end: 115 } is inserted as 5 rows:
itemB1 = 111
itemB2 = 112
...
itemBN = 115
query = { start: 110, end: 119 }
encodedQuery = commonPrefix(query.start, query.end) = "11"
intersects = beginsWith("111", "11") // returns false ✅
intersects = beginsWith("123", "11") // returns true ✅
This comes with its own trade offs.
Is there any other clever wizardry I could try to make this work? 🪄