The problem I'm trying to solve is basically the same as the one in this post:
https://stats.stackexchange.com/questions/339935/python-library-for-combinatorial-optimization
And my current implementation uses indeed a genetic algorithm based optimizer.
However, I would like to solve it as a binary linear programming problem (at least try, even though it's 'NP-hard', apparently).
My question is how to formulate the LP in the best way, because I am not sure I am doing it right.
The following is a simplified version of what I'm dealing with, which however shows exactly where the problem lies.
We make m*n (in this case 6) objects by a combinatorial process taking m (3) objects of type 'R1' (say {A,B,C}) and n (2) objects of type 'R2' (say {X,Y}).
The 6 objects {AX,AY,BX,BY,CX,CY} are evaluated and each gets a score D, in this case {0.8,0.7,0.5,0.9,0.4,0.0}, in this order.
CL <- cbind(expand.grid(R2=LETTERS[24:25],R1=LETTERS[1:3],stringsAsFactors = FALSE),D=c(0.8,0.7,0.5,0.9,0.4,0.0))
Now we want to select 2 distinct R1's and 1 R2 such that the sum of D is maximal.
In this example, the answer is R1 = {A,B}, R2 = {Y}.
However, one would not get to such conclusion taking, for instance, the 2 R1's and the R2 with the highest average D.
It would work for R1, but not for R2:
aggregate(D~R1,CL,mean)
# R1 D
#1 A 0.75
#2 B 0.70
#3 C 0.20
aggregate(D~R2,CL,mean)
# R2 D
#1 X 0.5666667
#2 Y 0.5333333
I know how to formulate this as a linear programming problem; only I am not sure my formulation is efficient, because basically it results in a problem with mn+m+n variables and 2(m+n)+2 constraints.
The main difficulty is that I need somehow to count the number of distinct R1's and R2's chosen, and I don't know any way of doing that apart from what I will show below (and is also described in my other post here).
This is what I would do:
CL["Entry"] <- seq_len(dim(CL)[[1]])
R1.mat <- table(CL$R1,CL$Entry)
R2.mat <- table(CL$R2,CL$Entry)
N_R1 <- dim(R1.mat)[[1]]
N_R2 <- dim(R2.mat)[[1]]
N_Entry <- dim(CL)[[1]]
constr.mat <- NULL
dir <- NULL
rhs <- NULL
constr.mat <- rbind(constr.mat,cbind(R1.mat,-diag(table(CL$R1)),matrix(0,N_R1,N_R2)))
dir <- c(dir,rep("<=",N_R1))
rhs <- c(rhs,rep(0,N_R1))
constr.mat <- rbind(constr.mat,cbind(R2.mat,matrix(0,N_R2,N_R1),-diag(table(CL$R2))))
dir <- c(dir,rep("<=",N_R2))
rhs <- c(rhs,rep(0,N_R2))
constr.mat <- rbind(constr.mat,constr.mat)
dir <- c(dir,rep(">=",N_R1+N_R2))
rhs <- c(rhs,1-table(CL$R1),1-table(CL$R2))
constr.mat <- rbind(constr.mat,c(rep(0,N_Entry),rep(1,N_R1),rep(0,N_R2)))
dir <- c(dir,"==")
rhs <- c(rhs,2)
constr.mat <- rbind(constr.mat,c(rep(0,N_Entry),rep(0,N_R1),rep(1,N_R2)))
dir <- c(dir,"==")
rhs <- c(rhs,1)
obj <- c(aggregate(D~Entry,CL,c)[["D"]],rep(0,N_R1+N_R2))
Which can be solved for instance by lpSolve:
sol <- lp("max", obj, constr.mat, dir, rhs, all.bin = TRUE,num.bin.solns = 1, use.rw=FALSE, transpose.constr=TRUE)
sol$solution
#[1] 0 1 0 1 0 0 1 1 0 0 1
showing that products {AY,BY} were selected, corresponding to R1 = {A,B} and R2 = {Y}:
CL[as.logical(sol$solution[1:N_Entry]),]
# R2 R1 D Entry
#2 Y A 0.7 2
#4 Y B 0.9 4
I found that on large problems lpSolve gets stuck for ages; Rsymphony seemed to perform better.
But again, my main question is: is this way of formulating the LP efficient? Should I do it differently?
Thanks!
EDIT
In the meantime, working on a somewhat related problem, I found that only one set of constraints may be sufficient, if one adds 'costs' (in this example, negative) to the objective vector for the 'distinct R1 and R2' variables.
So here, instead of:
obj <- c(aggregate(D~Entry,CL,c)[["D"]],rep(0,N_R1+N_R2))
I would do:
obj <- c(aggregate(D~Entry,CL,c)[["D"]],rep(-1,N_R1+N_R2))
This would make m+n constraints unnecessary.
It still remains a huge problem to solve, even for relatively small m, n, so if anyone can advise how to do it better...
I had a look at lp.transport, but that would be limited to 2 dimensions (i.e. only R1 and R2, not R1, R2, R3 for instance), and I don't think you can constrain the number of distinct objects per category in that kind of solver.