4

I'm working on a platform that allows assigning users to events manually. Every user provides their general availability (Mondays 2PM - 8PM, Tuesdays not at all, Wednesdays 3:30PM-7PM, and so on). Multiple events can take place in parallel and can overlap. The restrictions for finding available users for a particular event are:

  • The event must fully fall into the user's defined availability slots.
  • A user can attend only one event at a time, so no overlap.

At the moment, we run through all the users, their availability slots, and their assigned events in order to determine whether they're available for a specific event. This search can obviously take some time. Is there a "good" way to perform this kind of computation? What are the optimal data structures here? (If it makes a difference, we're talking about a Java Spring service on top of a PostgreSQL DB. But all of this can be challenged if required.)

It feels like a special case of the Nurse Scheduling problem. I haven't found a better solution yet other than comparing bit by bit and braking early in case of a clash.

EDIT: Some have been asking for example numbers. The largest ones we have so far are around 3500 users for 600 events, with each event requiring between 1 and 10 users. We are expecting scenarios of up to 15000 users for 5000 events.

EDIT 2: As for the availability slots: Users specify their availability times per weekday in blocks to half hour granularity, and there's an option to mark specific dates as completely unavailable (intended primarily for holidays). Events are scheduled with start and end times up to a granularity of 5 minutes.

EDIT 3: To clarify the idea: We've got many "assignee" users that can be assigned to events by a much smaller number of "master" users. This works by the master users selecting an event which they want to assign one or more assignees to. They're being presented with a selection of available assignees to choose from. The point of this question is determining this selection in an efficient manner. Aside from this, master users can select a group of events and open this up for assignees to self-assign. For this purpose, assignee users are presented with a selection of events open for self-assignment that fit their availability.

EDIT 4: Since this has been asked, I'm concerned with the issue of finding available assignees for particular events. The DB model for the base data is straightforward enough.

New contributor
Ray is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
19
  • 1
    Ray, give us some realistic numbers. How many users are we talking of - 10, 1000, 110 million? How many events? How many average availability slots per user? What is the time granularity - 15, 30 or single minutes? How many event queries per hour do you expect? How long does a query currently take in average? You will need these numbers not just for us, but also for yourself, to compare the effectivity of any measures you try. Commented Nov 26 at 6:27
  • 1
    Note, this is NOT a scheduling NP hard problem! As each assignment is done manually, the system does not need to decide which event to assign a user to. All decisions are made by dispatcher during previous assignments, or are being done now given available users. I.e. the computational task is to show available users, not to optimize their assignments. Commented Nov 26 at 12:02
  • 2
    This is a DBA question, not an algorithmic one. Commented Nov 26 at 12:10
  • 1
    After rereading the question, I'm a little confused. You start with "allows assigning users to events manually" and then you say "we run through all the users, their availability slots, and their assigned events". Are you trying to list events for a user to choose or is this some sort of bulk assignment process? Commented Nov 26 at 22:06
  • 1
    Great update. Thanks. So if I understand correctly, given an event, you are trying to find an available set of people that can be assigned to that event? Commented Nov 27 at 3:46

5 Answers 5

3

One way to do it is to represent each user and each event as a bitmap. Essentially, each half hour corresponds to a bit, the entire day corresponding to 6 bytes. 0 means the person is available, and 1—that the person is not available. Same for the events.

Now, all operations become operations on bitmaps (XOR, etc.), which are usually very fast and still relatively space efficient—you can use SIMD, or do the computation in GPU—essentially rely to any techniques that are used when working with graphics. Chances are, you can just load everything in memory at once, do the computation you need, then save the result.

Now, if you decide to go this way, there are two important points:

  • Do it as an alternative to the naive, most basic implementation. This way, you'll be able to (1) measure the performance of the original implementation to decide if it is acceptable, and if not, (2) compare the performance of both implementations to see how much you saved.

  • Decide at which point you need to go from the actual structured data to bitmaps. Very likely, you want the data to be stored in the database in a human-readable format: as highlighted by Flater in the comment below, bitmaps are not well fit for things like recurring schedules (and obviously all but human-readable: looking at, say, 0xC4399F2A03, it is not necessarily obvious that the person is available tomorrow from 3 p.m. to 3:30 p.m., but then busy until 5 p.m. This means that you'll need to convert the data back and forth. This process would also take time, that is important to measure as well.


I was curious to see if this approach is viable, so I decided to make a quick test. As I was poking with code, I made a few discoveries that could be useful to share.

But let's start with the actual code. I wanted to start with something small, just to see if it works correctly: fifteen users, five events, and only one day span instead of a week. And I'm doing it in Python—which appeared to be handy for one particular aspect.

import random
import time

half_hours = 1 * 24 * 2  # 48 bits.
assert half_hours % 8 == 0
bitmap_size = int(half_hours / 8)  # 6 bytes.
total_users = 15
total_events = 5

To create sample data, there is no need to be particularly realistic: users' schedules are completely random (say, the person is available from 3 a.m. to 3:30 a.m., then busy for half an hour, then available for two hours, then busy... and so on), while meetings are always consecutive blocs that span 30 minutes to 3 hours and start at a random time.

def create_users():
    for _ in range(total_users):
        yield int.from_bytes(random.randbytes(bitmap_size))

def create_events():
    for _ in range(total_events):
        duration = random.randint(1, 6)  # 30 minutes to 3 hours.
        pattern = {
            1: 0b1,
            2: 0b11,
            3: 0b111,
            4: 0b1111,
            5: 0b11111,
            6: 0b111111,
        }[duration]
        start_time = random.randint(0, half_hours - duration)
        yield pattern << start_time

Here's, finally, the thing that takes those users and events, and determines, for each event, how many participants could join.

def b(value):
    return bin(value)[2:].rjust(half_hours, '0')

def run():
    users_slots = list(create_users())
    events_slots = list(create_events())

    print("Users:")
    for u in users_slots:
        print(b(u))

    users_per_event = []

    for e in events_slots:
        count = 0
        for u in users_slots:
            if u & e == e:
                count += 1
        users_per_event.append((e, count))

    most_popular_counter = 0
    for _, count in users_per_event:
        if count > most_popular_counter:
            most_popular_counter = count

    print(f'Luckiest event has {most_popular_counter} potential participant(s).')

    print("Events:")
    for e, count in users_per_event:
        print(f'{b(e)}: {count}')

if __name__ == '__main__':
    run()

Here's an example of an output. Naturally, the data being random, each time the script runs, the binary patterns are different, and so would be the counters.

Users:
010101110001001101000011010110110110100001101000
001011111110111111101001010011011001110011111000
111010010101101000110001010011010010010111111101
111001101001100010000000000011001001100101001000
011011110100001011100100101100011110111001000010
010001110010111011001100110000111100101101011010
001111110111100011001110101100101100111110000010
111100111111100101111111101110010001010111101110
000100101000011101100010110101110010100111101111
010100111000001110110010110011111111110000100100
000000100111001011111101001100001001110101010101
111001011010000011010000001111110010100000100101
100110111000110100111011111111000101110000100011
010000111110000001001001101100101011010011010111
010100100101000100110101111101110001111000111011
Luckiest event has 3 potential participant(s).
Events:
000000000000000000000011000000000000000000000000: 3
000000000000000000000000000000000011111000000000: 0
000000000000000011100000000000000000000000000000: 3
000000000000000000000001111100000000000000000000: 2
000000000000000011111000000000000000000000000000: 1

The number at the right of each event indicates how much users can attend this event. It is relatively easy to see how it works visually. Taking, for instance, the third event, one can spot that the second user in the list has all ones matching the ones of the event. So does the fifth user, and another one towards the middle of the list. The algorithm seems to be working.

Now, let's modify the script a bit, by:

  • Increasing the span from one day to a week, as well as increasing the number of users and events.
  • Removing the binary output.
  • Adding the thing that measures how long the code spends doing stuff.
half_hours = 7 * 24 * 2  # 336 bits.
assert half_hours % 8 == 0
bitmap_size = int(half_hours / 8)  # 42 bytes.
total_users = 15000
total_events = 5000

[...]

def run():
    users_slots = list(create_users())
    events_slots = list(create_events())

    start = time.time()

    users_per_event = []

    for e in events_slots:
        count = 0
        for u in users_slots:
            if u & e == e:
                count += 1
        users_per_event.append((e, count))

    end = time.time()

    print(f'Spent {end - start:.3f} seconds.')

    most_popular_counter = 0
    for _, count in users_per_event:
        if count > most_popular_counter:
            most_popular_counter = count

    print(f'Luckiest event has {most_popular_counter} potential participant(s).')

Running it on an Intel i5-10600 3.30 GHz CPU, using a single core, the output is:

Spent 3.466 seconds.
Luckiest event has 7680 potential participant(s).

Observations:

  1. Three and a half seconds (YMMV) is not too bad for a naive approach that don't use SIMD or even parallel computing.

  2. I tried a parallel approach with concurrent.futures.ProcessPoolExecutor, but it led to the time increasing up to seven seconds, because the large list of users is copied. One needs to use shared memory or find another approach, but I didn't have the patience to poke with it long enough.

  3. I'm surprised how easy it is to work with data, actually. Originally, I didn't agree with Hans-Martin Mosner's comment, as I was believing that bitmaps should be very constrained—limited to the actual computations, but should be converted to human readable format as soon as possible. After playing with the Python script, I would rather think that binary approach is quite clear. If enough quality tooling is developed around it, it is definitively a viable approach even considering bitmaps being stored in the database.

  4. The algorithm, too, is at its simplest. I mean, it's literally u & e == e!

  5. I'm cheating a bit, since Python has no maximum limit for integers. This being said, manipulation of bytes in other languages should work similarly.

5
  • 1
    OP is using examples of recurring schedules, which doesn't naturally jive with your suggestion of a pre-baked bit string, given that a recurring availability schedule would need to be applied dynamically to any and all future dates/events. Not saying it can't be done, but this answer requires more of an elaboration on that point for it to meaningfully answer the OP's question. Commented Nov 26 at 23:21
  • @Flater: sure. Answer edited. Commented 2 days ago
  • In addition to being a very viable solution when keeping the bitmaps in memory, this would likely even work well in the database as PostgreSQL has bit string data types with boolean operations (including aggregate functions) which may lend themselves well to combining recurrent schedules, assigned events and holiday exemptions for users through SQL statements. The list of people available to be booked on an event could be returned from a single database query. Commented 2 days ago
  • This sounds intriguing and is one of these crazy tricks I come across very rarely which is why I didn't consider it even remotely. Just for my understanding: As the events can be scheduled to a granularity of 5 minute slots, we'd require a bit for each of that, right? So that's 12*24 = 288 bits or 36 bytes. That's a considerably larger number than with the half hour example. But it would still fit into a long in Java, so it should be technically possible. I'll consider it (probably only for computation/caching purposes, persisting this sounds awkward). Thanks. :-) Commented 2 days ago
  • @Ray: with a granularity of 30 minute slots, you're at 336 bits for a week, or 42 bytes. If you need 5 minute slots, that's 2016 bits (12×24×7), or 252 bytes. I also added an example in my answer, with the actual performance data. Commented 2 days ago
2

At the moment, we run through all the users, their availability slots, and their assigned events in order to determine whether they're available for a specific event.

The first thing you want to get out of this process is the repeated checking of the assigned events of a user:

  • A user has availability slots (time intervals, pairs of from-to values).

  • When a users gets assigned to an event, his/hers availability slots are reduced (maybe one availability slot will be split). As a result, a user has a changed set of availability slots. Fullstop. No further need to look where these availability slots were coming from.

  • The availability slots of a user don't overlap, so they can be strictly ordered in a simple array. This makes it possible to look for the starting time of the specific event by a binary search and find out if it is falling into one of the availability slots (since availability slots don't overlap, this results in one slot at maximum).

  • It remains to check whether the end time of the specific event fits into the found availability slot, or not.

Average order of running time per single query: in theory, this will be O(log(A) x U), where A is the average number of availability slots per user, and U the number of users. Beware, when you need to retrieve all the availability slots first for each user, for each query, the retrieval itself becomes O(A), and the binary search will gain you nothing. Ideally, you will cache the availability slots for the users once beforehand and then run multiple event queries. If you can't, the running time order becomes O(A x U), which is what you seem to have now.

As long as you don't have millions of users where only a small percentage are eligible for a certain event, I don't see any need for further optimizing the data structures. Just run through each user, apply the test above (which should be quick), and return the found users. Whenever an event gets assigned to a user, update the availability slots of that user.

According to your edit: it seems you do not even need to run your queries against the full set of all users, only against smaller subsets. That makes it even more feasible to determine availble users by just iterating over the candidates.

5
  • This would require converting the availability data into individual dates covering the range from the first to the last events. Still sounds worthwhile exploring. As I'm on the JVM, I'd prefer a TreeSet over a simple array, but otherwise I'll experiment with this idea. Thanks. :-) Commented 2 days ago
  • @Ray - Not necessarily - all that is required is to sort the data via some total ordering; your library probably offers a sort function that accepts as an optional argument a comparison function that determines the order. You can either sort the data in place in an array, which may provide some performance benefits due to cache locality. If that's not convenient for some reason, this may not be the best approach, but you can also leave the data as is, and sort an array of pointers or indices, again via a custom comparison function that gets the actual objects and compares them. Commented 2 days ago
  • @Ray: how is your availability data modeled currently (if not by individual dates, or date pairs)? Commented 2 days ago
  • I thought I mentioned this in the post: Per user, it's a set of days of the week with time slots + blocked individual dates (e.g. for vacations). So an assignee can configure something like: "I'm available every Monday from 14:00 - 21:00, every Tuesday from 10:00-12:00 and 14:30-21:00, not at all on Wednesdays, ... (and so on for the remaining weekdays). On top of that, I'm on vacation from the 27 of November to the 4th of December this year." Does this make sense? Commented 2 days ago
  • 1
    @Ray: you mentioned your users can specifiy their availability time this way, not that the data is actually modeled that way. Yes, this makes perfectly sense. Converting these descriptions to time intervals may simplify the test whether a user is available for a certain event. Still, I think you need to verify if the combined timed for the conversion plus the test(s) are really faster than using the availability descriptions directly. Commented 2 days ago
1

Interesting problem. My university used simulated annealing to shift lectures, rooms, times, and students around to form a schedule which they manually tweaked before publishing for the semester.

Sounds to me like you want this to be somewhat more dynamic though.

A spatial tree feels right for this. An R*-tree would be ideal but others can do the job. That's a region Tree with Leaf-node walking.

We need two of them for a given period of time (a day, a week, a month, up to you really). The first stores the events, the second attendee's availability.

Both tree will start with a single node containing no entries, marked for the entire time that the tree represents. This is so that latter we don't have to check if nodes have consecutive time stamps, we know that the structure is dense, therefore the next node is always for the slot of time after this one.

When a user says I'm available between time x and time y, you insert them into the tree for that region of time. We follow the tree down break apart the node found if it doesn't start at the same time the attendee starts being available. Add them into the node, and each subsequent leaf node until you find the node that matches the attendees end of availability. (Break it apart if that node extends past it).

Repeat for each interval of each users time. Given your system is probably online this allows users to add new availability easily at their leisure. Bulk inserts are also possible.

Removing time is similar, just avoid the merge step. Because time is always marching forward you'll pay more to remove unnecessary nodes. Perhaps rebuild the tree offline if there has been some threshold of deletes reached, or if a tree built for say two years from now becomes next months tree.

Do all of the above for the events as well.


To find out the list of available attendees for an event. lookup up the attendees tree for the start of the time period, and grab a list of each leaf from there to leaf containing the end time. Intersect the lists on those nodes Whoever is left is available.

To find the events available, grab the events tree Per interval of available time -> Find the first leaf node that starts after the attendees becomes available, and union with each next node until it ends after the attendee becomes unavailable. Grab the nodes on either side of the union and subtract them from the union. This is the list of available events for that attendee.

Why?

  • For the events finding attendees: the fact that attendees are listed as available in a region of time is all that is needed. Intersecting with each leaf node that makes up that time ensures they are available for the entire block of time. If that's a single node great.
  • For attendees finding events: the event needs to be present at least once in the period of time they are available. But the event might start earlier or end later, that is why we subtract the node starting before and the node ending after the attendee's availability - even if those nodes are also partially in the region of availability with the attendee. Their presence there means they must start before or end after.

An optimisation might be to make the tree for each set of PreReqs that an Antendee has, or an Event needs. This allows you to reduce the sets to be explored upfront. The downside is that the event/attendee will need to be added to multiple trees for the same time period as attendees should always show up in the no prereqs tree.

Another optimisation is to make the lists of events users a bit table, and have a secondary entry table to properly reference them. This allows for quick bit ops for intersection, union, and subtraction. It also make node splitting a faster op (copy binary blob, versus duplicating references). It does add in a translation step of mask to actual data, but that isn't much of a slow down.


This works great for one off events, but you probably want to offer some better services like highlighting if there is enough event capacity for those who wish to attend, and how flexible your attendees are.

So Attendees wishlist a event type.

An organiser wants to know if a given set of times satisfies demand. When an attendee wishes for the event type we add their availabilities to a new tree just for the event type. The organiser can now see this tree plotted out as a graph over a day/week/month and can position events over periods of time where availability is enough to warrant it.

Even better with a set of classes in mind we can ask for lists of attendees. We just count how many times an attendee is available for an entire class and gives us a flexibility measure for each attendee. The organiser knows who to prioritise, or how good their offering of times is.


You should be able to do something similar for the availability of events given conflicting wishes. To guide an attendee into choosing events that give them more flexibility later.

1

Doing search service side is a very bad idea. You would have to keep assignments synchronized with DB doing error-prone cache invalidation and potentially moving significant volumes of data around.

Do not do full scans service-side. Use DB to do searches:

CREATE TABLE workers
(
  name VARCHAR NOT NULL PRIMARY KEY
)
CREATE TABLE assignments
(
  worker VARCHAR NOT NULL REFERENCES workers(name),
  during tsrange NOT NULL,
  CONSTRAINT overlapping_times EXCLUDE USING GIST (
        worker WITH =,
        during WITH &&
    )
)
INSERT INTO workers VALUES
  ('John'),
  ('Mary')
INSERT INTO assignments VALUES
  ('John', '[2021-01-01 00:00:01, 2021-01-01 00:09:01)'),
  ('John', '[2021-01-01 00:13:01, 2021-01-01 00:14:01)'),
  ('Mary', '[2021-01-01 00:05:01, 2021-01-01 00:16:01)')

Find busy workers:

SELECT worker from assignments WHERE during && tsrange('[2021-01-01 00:10:01, 2021-01-01 00:11:01)')
worker
Mary

Find available workers:

SELECT name FROM workers LEFT JOIN assignments
  ON assignments.worker = workers.name AND during && tsrange('[2021-01-01 00:10:01, 2021-01-01 00:11:01)')
  WHERE assignments.worker IS NULL
name
John

fiddle

I'm not sure which additional indexes would be optimal for this case, but btree-gist module surely has necessary operators to build them. There is a chance, that all indexes that might be needed are already created by CONSTRAINT directives.

2
  • I like this answer, especially because I have learned something new about PostgreSQL specific data types, constraints and indexes. Whether it is applicable, however, depends on whether the OP can manage it to restructure their current db model using tsrange for availability intervals. This isn't obvious, and the OP, though we asked them more than once about it, stayed very vague in the clarifications. Commented 18 hours ago
  • @DocBrown GIST indexes work without ranges too: Postgres date overlapping constraint Commented 18 hours ago
0

I like to use a time blocks visualization when thinking about these kinds of problems. Here's a very simple example of what I now understand to be your problem. We have three assignees in the pool of users: Alice, Bob and Carol. The problem space includes 14 periods. Let's say days but here for simplicity but you can do this with any increment makes sense for your needs. There are 5 events to assign users to: Events A - E. We are looking to schedule Event D. We can see our users availability and existing commitments in the following diagrams:

Alice's Schedule

Alice's schedule so far

Bob's Schedule

enter image description here

Carol's Schedule

enter image description here

In each of the above diagrams, we see the availability set for the user in green and the events they are already assigned to below that in red. At the bottom, we see the availability reduced by the events.

Now we want to find users that we can schedule for Event D. The modified availability and Event D's timing is shown below:

enter image description here

In the above diagram, we can see that Event D won't fit into Alice's schedule but that it will fit into Bob's and Carol's.

That's all well and good but you need this in an algorithmic approach. Once the availability periods are in this form, that is simple: find all the rows where the event fits into one of the periods.

For demonstration purposes, I'm going to lay out what I consider primitive types that make this kind of thing easier to work through in object oriented terms and then talk about how this can fit into a relational database. I'm going to use Python here. Let me know if that's a problem. There's nothing especially complex about these types and operations, they just help with the sometimes fiddly logic required.

Period

Represents a duration of time. Generally has a start and an end but you may allow either or both to be unbounded as needed. Periods need a merge (alt. combine, add, ...) operation of some sort. You also need an overlap (alt. intersection). For this problem, it's useful to have a remove method.

from typing import Set
from datetime import date

class Period:
    start: date # inclusive
    end: date # exclusive

    def __init__(self, start: date, end: date):
        if start > end:
            raise Exception("invalid period")
        self.start = start
        self.end = end

    def intersection(self, other):
        if self.start > other.end or self.end < other.start:
            return None

        start = min(self.start, other.start)
        end = min(self.end, other.end)

        return Period(start, end)

    def remove(self, other):
        if not self.intersection(other):
            yield self
        else:
            if self.start < other.start:
                yield Period(self.start, min(self.end,other.start))

            if self.end > other.end:
                yield Period(max(self.start,other.end), self.end)

    def add(self, other):
        if self.start > other.end or self.end < other.start:
            raise Exception("cannot add non-contiguous periods")

        start = min(self.start, other.start)
        end = max(self.end, other.end)

        return Period(start, end)
    
    def contains(self, other):
        return self.start <= other.start and self.end >= other.end

    def __repr__(self):
        return f"({self.start} - {self.end})"

One thing to note: when working with periods, using an inclusive start and and exclusive end will greatly simplify things. If your data input or output is not in that form, I would transform it as necessary so it is. In particular, determining period adjacency is much easier in particular in this form.

Period Collection

This is a specialized collection which holds any number of periods. It's a set but the items with in it are not distinct.

class PeriodCollection:
    periods: Set[Period]

    def __init__(self, *periods):
        self.periods = set()
        for p in periods:
            self.add(p)

    def remove(self, hole):
        # find all the intersecting periods
        overlap = {p for p in self.periods if p.intersection(hole)}

        # slice period out of all intersecting periods
        for p in overlap:
            self.periods.remove(p)
            for remainder in p.remove(hole):
                self.periods.add(remainder)

    def add(self, period):
        # find all the intersecting periods
        combine = {p for p in self.periods if p.intersection(period)}

        # merge all intersecting periods
        # alternately: find the min start and max end
        for p in combine:
           period = period.add(p)

        # remove all intersecting periods
        self.periods = self.periods - combine
        
        # finally insert the merged period
        self.periods.add(period)

    def sorted(self):
        return sorted(self.periods, key=lambda p: p.start)

    def __repr__(self):
        return str(self.sorted())

And that's all you need for this. Here's a little script which you can run to recreate the scenarios described above:

alice_periods = PeriodCollection(
    Period(date(2026,1,3), date(2026,1,15)) 
)

bob_periods = PeriodCollection(
    Period(date(2026,1,1), date(2026,1,7)),
    Period(date(2026,1,8), date(2026,1,12))
)

carol_periods = PeriodCollection(
    Period(date(2026,1,1), date(2026,1,15))
)

event_a = Period(date(2026,1,3), date(2026,1,5))
event_b = Period(date(2026,1,5), date(2026,1,7))
event_c = Period(date(2026,1,6), date(2026,1,9))
event_d = Period(date(2025,1,8), date(2026,1,10))
event_e = Period(date(2026,1,13), date(2026,1,15))

alice_periods.remove(event_a)
alice_periods.remove(event_c)

bob_periods.remove(event_a)
bob_periods.remove(event_b)

carol_periods.remove(event_a)
carol_periods.remove(event_b)
carol_periods.remove(event_e)


print("alice:", alice_periods)
print("bob:", bob_periods)
print("carol:", carol_periods)

With the necessary operations defined to determine the minimal set of availability periods, we can get to the question: how do we find the people who are available for a given event?

If we recall the last figure above, we can see that all we need to do is check if the period is in one of the user's availability slots.

The sort of obvious easy solution to that is to use the code above to go through each and every user, calculate their availability slots and see if at least one of them contains the event period.

But we can do much better than that. What would be a lot better and easier to optimize would be to search all the availability periods, and use that to look up the users for those periods. That means creating some sort of map from availability periods to users. And then you'll want to sort it on start and do a search. And then what about searching on end dates... You could spend a lot of time on all of that.

Or, you could create a table in your DB like this:

user start_date end_date
alice 2026-01-05 2026-01-06
alice 2026-01-09 2026-01-15
bob 2026-01-01 2026-01-03
bob 2026-01-08 2026-01-12
carol 2026-01-01 2026-01-03
carol 2026-01-07 2026-01-13
carol 2026-01-01 2026-01-09

And load it with the availability slots you've calculated above. Looking up the available slots is just something like:

SELECT user, start_date, end_date FROM availability
WHERE start_date <= :event_start
  AND end_date >= :event_end

If you put a range searchable index on start_date and end_date, I think this should perform more than adequately with the numbers of users and events you expect. The logic to create and update the availability table could be written on the database. I don't think there's an especially obvious good reason to do that, however.

I think the hard part about this problem, regardless of the solution you go with, is going to be managing concurrency. Primarily: how you make sure no conflicting assignments are created by users working at the same time. Each assignment will require removing a row from the availability and adding 0-2 back in. That needs to happen in a transaction at the very least. It's possible the availability could have been changed since the table was read. I would generally lean towards optimistic locking here (and in general) but that's getting outside the scope of the question.

6
  • 1
    Can someone explain the down-votes on this answer? Commented Nov 26 at 20:31
  • 1
    Sorry if I didn't explain the issue very well: I'm not trying to schedule events that fit people's schedules. I'm trying to find which people can attend an event. The event itself is fixed in time. Commented Nov 26 at 21:43
  • @Ray OK. Thanks for clarifying. You can use the same techniques. I will update the answer later and orient it to your actual problem. Commented Nov 26 at 21:54
  • 6
    I didn't downvote, but this answer seems to address a different question. In particular the question "I have a group of people, when are they all free". The OP has the question "I have a timeslot. Which people are free during that entire timeslot." The solution might be similar, but I can't get it from this answer. Commented 2 days ago
  • 1
    I don't like the denormalization required to manage availaability. Also, in indexed DB, there is no need to store availability, as it is trivial to find any assignments intersecting with a given interval. Commented yesterday

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.