Suppose I have a list of n sets, is there an efficient way of computing the combination of r sets whose intersection is larger than any other combination of r sets in the list?
In particular, I have a list of about 6000 sets of strings and I want to choose about 9 sets from this list such that they share the most strings out of all combinations of 9 sets. The problem is that if I were to brute-force it and compute all of the combinations and look for the max intersection it would take (6000 choose 9) ~ 3e28 computations, so I need either a more efficient algorithm or some reliable heuristic.
In addition, I would like to extend this question, if possible, to choosing a variable r such that the total element size of any combination is less than some arbitrary threshold, as opposed to choosing a constant r. That is, instead of just choosing 9 sets from the list of 6000 to make a combination, the algorithm would add sets of strings until the total number of strings in the combination exceeds some threshold, say 40 strings.
This is a lot closer to what I originally wanted to do, but I realized that taking constant size combinations of 9 sets works somewhat decently for the list I'm working with and would probably be a lot easier to implement, although an algorithm that could accomplish this would be preferable. From what I can gather, this problem is similar to the knapsack problem, although the only efficient way I know of solving that is with dynamic programming and I'm not sure how I would implement dynamic programming in this case given that I have to compute the running intersection to get the weights instead of having a pre-computed list of weights as with regular knapsack.