1

I have a pandas dataframe in python, with the following structure:

Date A B C D E F G H
2022-01-01 10:00:00 1 3 0 4 1 0 0 1
2022-01-01 11:00:00 0 1 1 2 0 2 3 5
2022-01-01 12:00:00 0 2 0 1 0 1 0 0
2022-01-01 13:00:00 0 0 2 0 0 3 1 2
2022-01-02 10:00:00 0 3 0 1 2 4 4 1
2022-01-02 11:00:00 0 0 6 0 0 0 5 0
2022-01-02 12:00:00 0 0 6 0 2 4 5 3
2022-01-02 13:00:00 0 0 0 1 3 0 0 0

This is a snippet of my real dataframe, which has lots of "letter" columns (more than 100). My problem is that I need to obtain the sum of the numbers for each datetime and letter for different combinations.

This was answered in All possible combinations as new columns of a dataframe based on primary column, but it computes all possible combinations from N to M elements. For a df of 100 columns, all possible combinations from 1 sigle letter column to the full combined 100 is madness and impossible to compute.

Fortunately, I am only interested in certain combinations (again, ca. 50 in the real df), which are given in a list:

list_possibilities = ['A+B', 'A+B+D', 'B+D+E+G', 'F+H', 'D+E+F+G+H', 'D', 'F', 'G+H']

As you can see, single columns such as "D" or "F" may also be a possibility. So the final df would be fairly compact, and wouldn't need a brute-force computation of thousands of combinations (here shown only for the first two rows for simplicity):

Date A+B A+B+D B+D+E+G F+H D+E+F+G+H D F G+H
2022-01-01 10:00:00 4 8 8 1 6 4 0 1
2022-01-01 11:00:00 1 3 6 7 12 2 2 8

Knowing the prior structure of the combinations allows to exponentially decrease the combinations and sums pandas must do, but I am unable to generalize the previous code of the solution to this case. Any ideas?

1 Answer 1

4

Use concat in a dictionary comprehension:

out = (pd
   .concat({cols: df.set_index('Date')[cols.split('+')].sum(axis=1)
            for cols in list_possibilities}, axis=1)
   .reset_index()
)

Output:

                  Date  A+B  A+B+D  B+D+E+G  F+H  D+E+F+G+H  D  F  G+H
0  2022-01-01 10:00:00    4      8        8    1          6  4  0    1
1  2022-01-01 11:00:00    1      3        6    7         12  2  2    8
2  2022-01-01 12:00:00    2      3        3    1          2  1  1    0
3  2022-01-01 13:00:00    0      0        1    5          6  0  3    3
4  2022-01-02 10:00:00    3      4       10    5         12  1  4    5
5  2022-01-02 11:00:00    0      0        5    0          5  0  0    5
6  2022-01-02 12:00:00    0      0        7    7         14  0  4    8
7  2022-01-02 13:00:00    0      1        4    0          4  1  0    0
Sign up to request clarification or add additional context in comments.

1 Comment

This is a very elegant and efficient solution. Thank you very much for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.