maximize column sum with constraints in python

Question

I have a dataframe like below:

from scipy import optimize
import pandas as pd
import numpy as np

df = pd.DataFrame({ 'customer':['A','A','B', 'B', 'C', 'C', 'D', 'D'],
                   'num_of_visits': [1,2,1,2,1,2,1,2],
                   'gain': [2,5, 3,4, 6,8, 5,10] })

customer    num_of_visits   gain
   A            1            2
   A            2            5
   B            1            3
   B            2            4
   C            1            6
   C            2            8
   D            1            5
   D            2            10

This is just a example. In reality, we have many many customers. For each customer, sales rep can make one visit or two visits. Sales gain from 1 or 2 visit can be found on column gain. For example, visit customer A once, the sales gain is 2, visit customer A twice, the sales gain is 5, etc. The goal is to find the optimal set of customers for the sales rep to visit and their corresponding number of visits, to maximize the total gain. The constraint:

total number of visit is 4.
one instance from each customer (either 1 or 2 visit)

This is a much simplified example, we can see the answer is: visit B once, visit C once, and visit D twice.

To find a generalized solution, we feel like this is an optimization problem. We should be able to use python scipy.optimize to find the solution.

I'm new to optimization. We need to maximize the sum of the column gain. Unlike optimizing a function with variable, how should I write the objective function? How should I implement the constraint to ensure one instance per customer? I have been thinking about it for hours and still do not have a clue. Appreciate it anyone can help with how to deal with this optimization problem.

You could set this up as an ILP (Integer Linear Program) and use binary variables to indicate whether you visit each customer 0/1/2 times and I think it would snap together pretty quick. But my gut tells me it could be attacked with Dynamic Programming, which would forgo the crash course in ILP, solver installs, etc. Haven't done any DP in a while. Maybe others have a contrary or supporting opinion? — AirSquid
– AirSquid, Commented Feb 18, 2022 at 18:01

joni · Accepted Answer · 2022-02-18 18:59:15Z

As already mentioned in the comments, this can easily be formulated as an integer linear program:

min sum(gain[c, v]*b[c,v] for c in customers for v in visits[c])

s.t. 

# visit each customer at most once
sum(b[c,v] for v in visits[c]) <= 1 for c in customers

# total number of visits
sum(b[c,v]*v for all v in visits, for all c in customers) ==  4

Here,

gain[c,v] is the gain if you visit customer c exactly v times.
b[c,v] is a binary variable that equals 1 if customer c is visited v times and 0 otherwise.
customers is the list of customers
visits[c] gives you the number of possible visits of customer c

Currently, there's no support for ILPs in scipy.optimize, but it's on the way :). In the meantime, you can use a modelling framework like PuLP:

import pandas as pd
import numpy as np
import pulp


df = pd.DataFrame({ 'customer':['A','A','B', 'B', 'C', 'C', 'D', 'D'],
                   'num_of_visits': [1,2,1,2,1,2,1,2],
                   'gain': [2,5, 3,4, 6,8, 5,10] })

# Sets and parameters
customers = df.customer.unique()
gain = {(row.customer, row.num_of_visits): row.gain for idx, row in df.iterrows()}
visits = {c: df[df.customer == c].num_of_visits.values for c in customers}

# pulp model
mdl = pulp.LpProblem("milp", pulp.LpMaximize)

# define the binary variables
b = {}
for c in customers:
    for v in visits[c]:
        b[c, v] = pulp.LpVariable(f"b[{c}, {v}]", cat="Binary")


# define the objective
mdl.setObjective(sum(gain[c, v]*b[c, v] for c in customers for v in visits[c]))

# Visit each customer at most once
for c in customers:
    mdl.addConstraint(sum(b[c, v] for v in visits[c]) <= 1)

# total number of visits
mdl.addConstraint(sum(b[c, v]*v for c in customers for v in visits[c]) == 4)

# solve the problem
mdl.solve()

# output the solution
print(f"Status: {pulp.LpStatus[mdl.status]}")
for c in customers:
    for v in visits[c]:
        if b[c, v].varValue > 0:
            print(f"Visit customer {c} exactly {v} times")
print(f"objective value: {mdl.objective.value()}")

which yields

Status: Optimal
Visit customer B exactly 1 times
Visit customer C exactly 1 times
Visit customer D exactly 2 times
objective value: 19.0

Collectives™ on Stack Overflow

maximize column sum with constraints in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related