0

I need to combine all iterations of subgroups to apply a function to and return a single value output along with concatenated string items identifying which iterations were looped.

I understand how to use pd.groupby and can set level=0 or level=1 and then call agg{'LOOPED_AVG':'mean'}. However, I need to group (or subset) rows by subgroup and then combine all rows from an iteration and then apply the function to it.

Input data table:

MAIN_GROUP  SUB_GROUP   CONCAT_GRP_NAME X_1
A   1   A1  9
A   1   A1  6
A   1   A1  3
A   2   A2  7
A   3   A3  9
B   1   B1  7
B   1   B1  3
B   2   B2  7
B   2   B2  8
C   1   C1  9

Desired result:

LOOP_ITEMS  LOOPED_AVG
A1 B1 C1    6.166666667
A1 B2 C1    7
A2 B1 C1    6.5
A2 B2 C1    7.75
A3 B1 C1    7
A3 B2 C1    8.25

1 Answer 1

1

Assuming that you have three column pairs then you can apply the following, for more column pairs then adjust the script accordingly. I wanted to give you a way to solve the problem, this may not be the most efficient way but it gives a starting point.

import pandas as pd
import numpy as np
ls = [
      ['A', 1, 'A1', 9],
      ['A', 1, 'A1', 6],
      ['A', 1, 'A1', 3],
      ['A', 2, 'A2', 7],
      ['A', 3, 'A3', 9],
      ['B', 1, 'B1', 7],
      ['B', 1, 'B1', 3],
      ['B', 2, 'B2', 7],
      ['B', 2, 'B2', 8],
      ['C', 1, 'C1', 9],

      ]

#convert to dataframe
df = pd.DataFrame(ls, columns = ["Main_Group", "Sub_Group", "Concat_GRP_Name", "X_1"]) 

#get count and sum of concatenated groups
df_sum = df.groupby('Concat_GRP_Name')['X_1'].agg(['sum','count']).reset_index()

#print in permutations formula to calculate different permutation combos   
import itertools as it
perms = it.permutations(df_sum.Concat_GRP_Name)


def combute_combinations(df, colname, main_group_series):
    l = []
    import itertools as it
    perms = it.permutations(df[colname])

    # Provides sorted list of unique values in the Series
    unique_groups = np.unique(main_group_series)

    for perm_pairs in perms:
        #take in only the first three pairs of permuations and make sure
        #the first column starts with A, secon with B, and third with C
        if all([main_group in perm_pairs[ind] for ind, main_group in enumerate(unique_groups)]):
            l.append([perm_pairs[ind] for ind in range(unique_groups.shape[0])])
    return l

t = combute_combinations(df_sum, 'Concat_GRP_Name', df['Main_Group'])

#convert to dataframe and drop duplicate pairs
df2 = pd.DataFrame(t, columns = ["Item1", 'Item2', 'Item3']) .drop_duplicates()

#do a join between the dataframe that contains the sums and counts for the concat_grp_name to bring in the counts for 
#each column from df2, since there are three columns: we must apply this three times
merged = df2.merge(df_sum[['sum', 'count', 'Concat_GRP_Name']], left_on=['Item1'], right_on=['Concat_GRP_Name'], how='inner')\
.drop(['Concat_GRP_Name'], axis = 1)\
.rename({'sum':'item1_sum'}, axis=1)\
.rename({'count':'item1_count'}, axis=1)

merged2 = merged.merge(df_sum[['sum', 'count', 'Concat_GRP_Name']], left_on=['Item2'], right_on=['Concat_GRP_Name'], how='inner')\
.drop(['Concat_GRP_Name'], axis = 1)\
.rename({'sum':'item2_sum'}, axis=1)\
.rename({'count':'item2_count'}, axis=1)

merged3 = merged2.merge(df_sum[['sum', 'count', 'Concat_GRP_Name']], left_on=['Item3'], right_on=['Concat_GRP_Name'], how='inner')\
.drop(['Concat_GRP_Name'], axis = 1)\
.rename({'sum':'item3_sum'}, axis=1)\
.rename({'count':'item3_count'}, axis=1)

#get the sum of all of the item_sum cols
merged3['sums']= merged3[['item3_sum', 'item2_sum', 'item1_sum']].sum(axis = 1)

#get sum of all the item_count cols
merged3['counts']= merged3[['item3_count', 'item2_count', 'item1_count']].sum(axis = 1)

#find the average
merged3['LOOPED_AVG'] = merged3['sums'] / merged3['counts']

#remove irrelavent fields
merged3 = merged3.drop(['item3_count', 'item2_count', 'item1_count', 'item3_sum', 'item2_sum', 'item1_sum', 'counts', 'sums' ], axis = 1)
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for working on this. It helped me to frame my real code and work through some of the logic needed. I used itertools.combinations instead of itertools.permutations because I need to specify the number of elements in an iteration (which I did not specify in my original question).
Maria Nazri Thanks again! The only thing I can't truly figure out is how to dynamically make this part better as I don't always have an 'A' or 'B' but sometimes it can be varying length strings: ` for perm_pairs in perms: #take in only the first three pairs of permuations and make sure #the first column starts with A, secon with B, and third with C if 'A' in perm_pairs[0] and 'B' in perm_pairs[1] and 'C' in perm_pairs[2]: l.append([perm_pairs[0], perm_pairs[1], perm_pairs[2]]) return l`

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.