Find highest number of swaps within a set of possible duty swaps with networkx and PuLP

Question

With huge help from Stack Overflow, I have created code which takes possible swap matches with given weights as an input and outputs a set of matches which gives the highest weight with the highest number of swaps.

A swap is a duty being transferred from one person to another. Every person needs to have exactly one duty.

The difference with a weighted graph is the fact that the highest number of swaps has higher priority.

To do this I multiply every swap with a high number and add the weight of that swap for the LpMaximize.

The code selects a set of graph cycles which gives the highest number of total swaps. The cycle length is limited by max_cycle_order. The selected cycles are added to the result array.

Questions

The method of introducing the weights and prioritizing feels cumbersome. How could this be improved?
It feels like formatting the output takes too many steps. How could this be improved?
With a high max_cycle_order and a high number of matches (7+ and 700+) the code takes a long time to run. In my use case at the moment not a problem because I'm expecting to use max_cycle_order between 2 and 5 but I'm interested in any ideas to limit the runtime for a high max_cycle_order.
Any other tips?

import networkx
import pulp
from datetime import datetime


def optimize(matches, max_steps=2):

    start_time = datetime.now()

    # find sum of all weights to be used to prioritize max swaps
    max_weight = sum(weight for (match_id, a, b, weight) in matches) + 1

    graph = networkx.DiGraph()
    for (match_id, a, b, weight) in matches:
        graph.add_edge(a, b)

    # default is one-on-one swaps (max_cycle_order = 2)
    max_cycle_order = 2
    if isinstance(max_steps, int):
        max_cycle_order = max_steps
        if max_steps < 2:
            max_cycle_order = 2
        if max_steps > 6:
            max_cycle_order = 6
    cycles = tuple(networkx.simple_cycles(graph, length_bound=max_cycle_order))

    cycle_names = ['_'.join(c) for c in cycles]

    # Create weight for each cycle to be used by LP solver
    cycle_orders = []
    for c in cycles:
        sum_weights = 0
        for vertex in c:
            for (match_id, v1, v2, weight) in matches:
                if vertex == v2:
                    sum_weights += weight
                    break
        cycle_orders.append(max_weight * len(c) + sum_weights)

    selectors = pulp.LpVariable.matrix(
        name='cycle', indices=cycle_names, cat=pulp.LpBinary,
    )

    prob = pulp.LpProblem(name='swaps', sense=pulp.LpMaximize)
    prob.setObjective(pulp.lpDot(selectors, cycle_orders))

    for vertex in graph.nodes:
        # create list of all cycles where vertex (request) is used
        group = [
            selector
            for selector, cycle in zip(selectors, cycles)
            if vertex in cycle
        ]
        # maximize total of these cycles to 1 or smaller to not double use a request
        prob.addConstraint(
            name=f'excl_{vertex}',
            constraint=pulp.lpSum(group) <= 1,
        )

    prob.solve()
    assert prob.status == pulp.LpStatusOptimal

    # process selected cycles for output
    print('Selected cycles:')
    swap_count = 0
    result = []
    for cycle, selector in zip(cycles, selectors):
        if selector.value() > 0.5:
            cycle_result = []
            swap_count += len(cycle)
            print(', '.join(cycle))
            for c in range(0, len(cycle)):
                d = c + 1
                if d > len(cycle) - 1:
                    d = 0
                for (match_id, r1, r2, weight) in matches:
                    if cycle[c] == r1 and cycle[d] == r2:
                        cycle_result.append({'id': match_id, 'from': r1, 'to': r2})
                        break
            result.append(cycle_result)

    end_time = datetime.now()
    runtime = (end_time - start_time).total_seconds()
    print('Duration: {}'.format(end_time - start_time))

    output = {
        "status": 1,
        "result": result,
        "swapCount": swap_count,
        "maxSteps": max_cycle_order,
        "runtime": runtime
    }

    return output


# test function call
print(optimize([
    ('id1', 'Amy', 'Blake', 2), ('id2', 'Blake', 'Claire', 5), ('id3', 'Claire', 'Drew', 5), ('id4', 'Drew', 'Emma', 7),
    ('id5', 'Emma', 'Flynn', 3), ('id6', 'Flynn', 'Gabi', 4), ('id7', 'Gabi', 'Hana', 5), ('id8', 'Hana', 'Izzy', 3),
    ('id16', 'Izzy', 'Jill', 6), ('id17', 'Jill', 'Amy', 3),
    # one on one
    ('id9', 'Blake', 'Amy', 3), ('id10', 'Claire', 'Blake', 2), ('id11', 'Emma', 'Drew', 5),
    # three point
    ('id12', 'Gabi', 'Emma', 7), ('id13', 'Drew', 'Blake', 7),
    # four point
    ('id14', 'Flynn', 'Claire', 1), ('id15', 'Jill', 'Gabi', 4)
], 4))

Re. long runtime, have you profiled? You'll need to determine whether the slow part is NetworkX or PuLP. — Reinderien
– Reinderien, Commented Apr 10, 2024 at 19:10
On my M2 MacBook Air a set of 730 matches resulting in 39 swaps with max_cycle_order 7 takes 23 seconds. networkx is done after 2. CBC Solver states it took 12 seconds but probably in total it's a bit more. — Oehoe
– Oehoe, Commented Apr 10, 2024 at 19:31

J_H · Accepted Answer · 2024-04-10 19:18:50Z

helpers

This optimize() is Too Long, plus it would benefit from a docstring. To read it one must vertically scroll. It defines nearly three dozen locals. Psychologists are not in complete agreement whether the Gentle Reader can simultaneously hold seven things in short-term memory or perhaps a few more, but past thirty we have definitely exceeded the limit.

Rely on Extract Helper to produce get_cycles(). Break out the for loops as (unit testable) helpers, too. A desirable side effect will be that local variables go out-of-scope when we return from helper, leaving the reader with fewer concerns to keep track of.

loss function

The big thing I was missing, from the zero-length docstring and from the Review Context, was this. Suppose we have six people listed, and they all change their duty. Would it be cheaper to perform three pair-wise swaps, or a pair of three-person swaps?

Describe the business context and spell out the loss function, so maintenance engineers will be less likely to introduce code defects when they make edits in the coming months.

test suite

The print() is very nice and helpful. But it requires one to ignore some solver debugs, and to eyeball a dict of graph edges when deciding if the output is correct or not. An automated test "knows" the right answer and can display a Green bar by itself. Consider hardcoding the expected answer into your test code. Consider doing import unitttest.

Before reading up on docstrings and unittest.. I have not considered a loss function. In the bussinesscontext it would be better to have three pair-wise swaps. I can probably introduce this in the cycle_orders array which is used in the LpMaximize problem. cycle_orders.append(max_weight * len(c) + sum_weights) — Oehoe
– Oehoe, Commented Apr 10, 2024 at 19:48

toolic · Accepted Answer · 2024-04-10 19:38:57Z

The following suggestions are mostly for style as opposed to performance.

Documentation

As the previous answer mentions, add a docstring to the function describing its purpose, inputs and outputs.

Intent

The 2 if statements are not independent of each other:

    if max_steps < 2:
        max_cycle_order = 2
    if max_steps > 6:
        max_cycle_order = 6

They should be combined in an if/else:

    if max_steps < 2:
        max_cycle_order = 2
    elif max_steps > 6:
        max_cycle_order = 6

This better conveys the intent, and it is more efficient.

Add whitespace around the assignment operator for better readability. Change:

        name=f'excl_{vertex}',

to:

        name = f'excl_{vertex}',

DRY

This code shows up 3 times:

        len(cycle)

Set it to a variable, then use the variable:

        num_cycles = len(cycle)

Simpler

There is no need for the 0 in: for c in range(0, len(cycle)):

It is simpler as:

        for c in range(len(cycle)):

It is simpler to use an f-string. Change:

print('Duration: {}'.format(end_time - start_time))

to:

print(f'Duration: {end_time - start_time}')

Reinderien · Accepted Answer · 2024-04-11 04:15:59Z

3

You should move your dict-rendering code out of the main solve function, along with runtime calculation and printing.

I recall that this was intended for JSON, so don't status=1; JSON has booleans so use True. Also, I think it would make more sense to perform an outer try/catch/assign status=False than to either throw-or-return. Otherwise, why is status there?

Don't use datetime.now() for performance timing. Use time.monotonic for long scales like this, or time.perf_counter() for shorter scales if you're measuring pieces of code in a more granular manner.

answered Apr 11, 2024 at 4:15

Reinderien

71.2k5 gold badges76 silver badges257 bronze badges

\$\begingroup\$ I planned to use status=0 for any errors, for example status=1 when the solver gets a result but it might not be optimal (future use) and status=2 for an optimal solution. status=0 is set in the catch block of the code which calls the optimize function. \$\endgroup\$

Oehoe
– Oehoe

2024-04-11 07:22:33 +00:00
Commented Apr 11, 2024 at 7:22

Add a comment |

Stack Exchange Network

Find highest number of swaps within a set of possible duty swaps with networkx and PuLP

Questions

3 Answers 3

helpers

loss function

test suite

Documentation

Intent

DRY

Simpler

You must log in to answer this question.

Hot Network Questions

Questions

3 Answers 3

helpers

loss function

test suite

Documentation

Intent

Simpler

You must log in to answer this question.

Related