Can this problem be solved using knapsack problem-solving approach (varying-size knapsack?)

Question

To comply with regulation a bank needs to make sure that at least 90% (or some other fixed proportion) of the money it holds is “clean”.

Money is considered clean if it is in a “clean” account. An account is considered “clean” if all the money in it comes from a verifiable source.

The bank can forcefully return money to the clients coming from a non-verifiable source in order to increment the proportion of "clean" money to comply with the regulation. The problem is to determine the minimum amount that the bank needs to return to comply with the regulation.

Account Verif   NonVerif    Divi    Clean   NotClean
1   889.77  157.01  5.67    0   1046.78
2   907.88  160.21  5.67    0   1068.09
3   1187.18 209.5   5.67    0   1396.68
4   918.12  162.02  5.67    0   1080.14
5   1721.88 303.86  5.67    0   2025.74
6   828.17  276.05  3.00    0   1104.22
7   1127.6  375.86  3.00    0   1503.46
8   1177.13 392.37  3.00    0   1569.5
9   801.81  267.27  3.00    0   1069.08
10  741.9   247.3   3.00    0   989.2
11  0   1468.11 0.00    0   1468.11
12  0   853.66  0.00    0   853.66
13  2354.81 0   -1.00   2354.81 0
14  2267.1  0   -1.00   2267.1  0
15  2238.3  0   -1.00   2238.3  0
16  2188.66 0   -1.00   2188.66 0
17  2167.85 0   -1.00   2167.85 0
18  2166.1  0   -1.00   2166.1  0
19  2165.59 0   -1.00   2165.59 0
20  2163.84 0   -1.00   2163.84 0
21  2145.43 0   -1.00   2145.43 0
22  2117.76 0   -1.00   2117.76 0
23  1320.26 0   -1.00   1320.26 0
24  1299.99 0   -1.00   1299.99 0
25  1241.02 0   -1.00   1241.02 0
26  1237.36 0   -1.00   1237.36 0
27  1208.74 0   -1.00   1208.74 0
28  1114.58 0   -1.00   1114.58 0
29  1048.63 0   -1.00   1048.63 0
30  1010.92 0   -1.00   1010.92 0
31  971.1   0   -1.00   971.1   0
32  874.95  0   -1.00   874.95  0
33  832.01  0   -1.00   832.01  0
34  825.72  0   -1.00   825.72  0
TOTAL   45262.16    4873.22     34960.72    15174.66

In the example above, total money held by the bank is 34960.72 + 15174.66 = 50135.38; without doing any cleansing, only the 69.7% (34960.72 / 50135.38 = 0.697...) can be considered clean so the bank needs to cleanse to comply with the regulation.

If the bank cleansed the first two accounts then the total money held by the bank would be 50135.38 - 157.01 - 160.21 = 49818.16 and clean money would be 34960.72 + 889.77 + 907.88 = 36758.37; the proportion of clean money would be 36758.37 / 49818.16 = 73.7%.

In the example above, Div=Verif/NonVerif (Verif would be value and NonVerif would be Weight, as to see the items that provide the best ratio to select those); the list in example is sorted by descending Div. Naive approach is to pick which accounts to cleanse in that order until the bank complies with the regulation.

I was thinking of using approach suggested by Avikalp Srivastava here: so Verif (value) would be treated as weight and NonVerif (cost) would be treated as value; the regular knapsack problem-solving approach would then be used to find the maximum cost that can be removed while Verif remains >= 90% * (total money held by the bank); the thing is the total money held by the bank decreases when you add not-clean items to the knapsack because the bank is returning that money to the customer (so the knapsack would become smaller as more items get added to it?). Brute force caused memory overflow for the data shown. I actually tried to solve this for hours without getting any closer to the answer. Maybe knapsack problem-solving is not the correct approach for this (?)

The naive approach will suffice for my purposes, but I still want to find out how to solve it properly.

What does "cleasing" an account mean? It seems like the money changes from not clean, but then why does the total money held by the bank decrease? — LarrySnyder610
– LarrySnyder610, Commented May 17, 2019 at 20:05
I don't think it's exactly a knapsack problem (maybe there's a clever way to turn it into one) but how about just formulating it as a linear program (LP)? — LarrySnyder610
– LarrySnyder610, Commented May 17, 2019 at 20:11
@LarrySnyder610, cleansing an account means that the bank gets rid of all the money coming from a non-verifiable source in that account (the bank wants to get rid of as little money as possible) so that the account can be considered "clean" and that all the money in it gets to be considered as clean; cleansing an account that is already clean would be redundant (it wouldn't have any effect) so the bank would only cleanse "not-clean" accounts; the total money held by the bank decreases whenever the bank cleanses an account becuase it is getting rid of some of the money in that account. — Javier Rivera
– Javier Rivera, Commented May 18, 2019 at 22:28
@LarrySnyder610, before I decided to go for the naive approach, I was actually trying to solve program the solution in VBA because I need it to run in MS Excel. Te very first solution that I tried was using Excel's Solver (Simplex LP method) to produce the anwer, but it couldn't becuase "linearity conditions were not met"; I know very little to nothing about linear programming (always open for learning, though), but my guess is that the problem is non-linear becuase of total money held by the bank changing (decreasing) whenever bank cleanses an account. — Javier Rivera
– Javier Rivera, Commented May 18, 2019 at 22:32
So, cleansing a given account means: Take all of the non-verifiable funds and return them; then all of the verifiable funds become "clean", and the account itself becomes "clean". So cleansing is a binary action: Either we cleanse the whole account or we don't. Is that correct? — LarrySnyder610
– LarrySnyder610, Commented May 19, 2019 at 0:57

LarrySnyder610 · Accepted Answer · 2019-05-19 01:13:43Z

Note: This approach is based on my understanding in the comment above. If my understanding is wrong, I'll edit.

Here is an approach based on integer linear programming (ILP).

Let I be the set of all accounts and let I_c and I_n be the set of clean and non-clean accounts, respectively.
Let V[i] and NV[i] be the amount of verifiable and non-verifiable funds for account i.
Let r be the proportion of funds that must be clean (e.g., 90%).

(These are parameters -- inputs to the model.)

Let x[i] = 1 if we cleanse account i, for i in I_n, and 0 otherwise. (We won't cleanse accounts in I_c.)

(This is a binary decision variable -- a variable that the model will set the value of.)

Then the ILP is:

minimize sum {i in I_n} NV[i] * x[i]
subject to sum {i in I_c} V[i] + sum {i in I_n} V[i] * x[i] >= 
               r * (sum {i in I} (NV[i] + V[i]) - sum {i in I_n} NV[i] * x[i]))
           x[i] in {0,1} for all i in I_n

The objective function minimizes the total funds cleansed. (For each i in I_n, if we cleanse the account, then we get rid of NV[i] amount of money.) The first constraint says that the total clean money must be at least 0.9 of the total money: the total clean money is the original clean money (sum {i in I_c} V[i]) plus the newly cleansed money (sum {i in I_n} V[i] * x[i]); and the total money equals the original total (verified plus not verified) minus the funds cleansed. The second constraint just says all the x[i] variables have to be binary.

In terms of implementation, you can certainly solve this in Excel/Solver. (I suspect the nonlinearity you were getting was because you were writing the constraint more like

sum {i in I_c} V[i] + sum {i in I_n} V[i] * x[i] / (sum {i in I} (NV[i] + V[i]) - sum {i in I_n} NV[i] * x[i])) >= r

which is nonlinear.) You can also use Python/PuLP, or any number of other linear programming packages.

גלעד ברקן · Accepted Answer · 2019-05-17 11:54:55Z

1

One idea could be to to binary search on the target non-verified amount to return. Then for each such candidate, run a knapsack where that candidate is the maximum weight, each non-verified amount is the weight, and the value to maximise is the verifiable amounts in the accounts. (Of course, we only need to query accounts that have non-verified funds.)

edited May 17, 2019 at 11:54

answered May 17, 2019 at 11:38

גלעד ברקן

24k3 gold badges29 silver badges64 bronze badges

3 Comments

Javier Rivera Over a year ago

I do appreciate your suggestion because it makes me realise that I completely ignored simpler and more effective approaches to solving the problem by trying to fit it into a knapsack problem and not even considering combining it with other algorithms.

Javier Rivera Over a year ago

But I am still not entirely convinced that the solution works: I am working to produce an example where the binary search actually skips the best solution or to convince myself (by proving) that it does not.

גלעד ברקן Over a year ago

@JavierRivera thank you for your comment. As you work to find a counterexample, note that the stated problem is only to "determine the minimum amount that the bank needs to return."

Collectives™ on Stack Overflow

Can this problem be solved using knapsack problem-solving approach (varying-size knapsack?)

2 Answers 2

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related