1
A B C D E
Key 1 1 -1
Key 2 1 -1
Key 3 1 -1
Key 4 -1 1
Key 5 1 -1
Key 6 1 -1
Key 7 1 -1
Key 8 1 -2 1

Final Result

A B C D E
1 -1

suppose we have the above dataframe where each key is an option used to create a combination to get to the desired Final Result. Suppose that you can also specify max number of combinations it can use to achieve the final result below, how would one iterate through the dataframe and when a set of combinations equals the final result, it prints all the keys that make up the combo as well as number of combos it took?

For example, let's say the maximum number of combinations is 3-key combo. Then the following combinations will satisfy both the Final Result and stay under or equal to the number of key combos allowed to achieve it

Key 2 (itself), combos 1

Key 4 + Key 5, combos 2

Key 3 + Key 8, combos 2

1 Answer 1

1

Assuming:

df = pd.DataFrame({
    'A': [1, None, None, -1, 1, None, None, None],
    'B': [-1, 1, None, 1, None, 1, None, 1],
    'C': [None, -1, 1, None, -1, None, 1, -2],
    'D': [None, None, -1, None, None, -1, None, 1],
    'E': [None, None, None, None, None, None, -1, None]
}, index=['Key 1', 'Key 2', 'Key 3', 'Key 4', 'Key 5', 'Key 6', 'Key 7', 'Key 8'])

final = pd.Series([None, 1, -1, None, None], index=['A', 'B', 'C', 'D', 'E'])

You could use itertools to produce the combinations, and reindex+sum to compare to the expected output:

# helper function to produce the combinations
from itertools import combinations, chain
def powerset(iterable, max_set=None):
    s = list(iterable)
    if max_set is None:
        max_set = len(s)
    return chain.from_iterable(combinations(s, r) for r in range(1, max_set+1))

# loop over the combinations 
# reindex and sum
# compare to final with nulls as 0
MAX_N = 3
for c in powerset(df.index, MAX_N):
    if df.reindex(c).sum().eq(final, fill_value=0).all():
        print(c)

Output:

('Key 2',)
('Key 3', 'Key 8')
('Key 4', 'Key 5')
('Key 1', 'Key 2', 'Key 4')

Note that this produces Key1/Key2/Key4 as a valid combination since Key1/Key4 cancel themselves and Key2 alone is valid.

To avoid this, you could keep track of the produced combinations and only retain those that are not a superset of already seen valid combinations:

MAX_N = 3
seen = set()
for c in powerset(df.index, MAX_N):
    if df.reindex(c).sum().eq(final, fill_value=0).all():
        f = frozenset(c)
        if not any(f > s for s in seen):
            seen.add(f)
            print(c)

Output:

('Key 2',)
('Key 3', 'Key 8')
('Key 4', 'Key 5')
Sign up to request clarification or add additional context in comments.

5 Comments

Wow! This is the coolest thing I have seen for awhile. Very elegant and efficient. Upvote 100%
as an additional step, how would create a dataframe that automatically detects number of MAX_N then assign as columns and have index be sorted from say Key1 to Key4 like below: Key1, Key2, Key4 Key2 Key3,Key8 Key4,Key5
@ChairmanMeow I don't really understand what you mean. If this is a quick update you may update the question to clarify your point, otherwise it might be better to open a follow up question.
Hi mozway, I formulated my question in a separate one, because i agree it is different yet still related to this. stackoverflow.com/questions/79343427/…
Yes, it's quite different. I've posted an answer there ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.