1

I have a DataFrame df_data:

CustID    MatchID    LocationID   isMajor  #Major is 1 and Minor is 0
  1        11111       324         0  
  1        11111       324         0
  1        11111       324         0
  1        22222       490         0
  1        33333       675         1
  2        44444       888         0

I have a function and parameter like this:

def compute_something(list_minor = None, list_major = None):
   return pass

Explain Parameters: with CustID = 1 the parameters should be list_minor = [3,1] (position is not important), list_major = [1] because with LocationID = 324 he get 3 times and LocationID = 490 he get 1 time (324,490 gets isMajor = 0 so it should be into 1 list). Similiar, CustID2 have parameters list_minor = [1] and list_major = [] (if he don't have data major/minor, I should be pass [].

This is my program:

data = [
    [1, 11111, 324, 0],
    [1, 11111, 324, 0],
    [1, 11111, 324, 0],
    [1, 22222, 490, 0],
    [1, 33333, 675, 1],
    [2, 44444, 888, 0]
]
df_data = pd.DataFrame(data, columns = ['CustID','MatchID','LocationID','IsMajor'])
df_parameter = DataFrame()

df_parameter['parameters'] = df.groupby(['CustID','MatchID','IsMajor'])['LeagueID'].nunique()

But results of df_parameter['parameters'] is wrong:

                                    parameters
 CustID     MatchID    IsMajor
   1         11111        0             1   #should be 3
             22222        0             1
             33333        1             1
   2         44444        0             1

Can I get the parameters I explained above with groupby and pass them to the function?

1 Answer 1

1

How about:

(df.groupby(['CustID','isMajor', 'MatchID']).size()
   .groupby(level=[0,1]).agg(set)
   .unstack('isMajor')
)

Output:

isMajor       0    1
CustID              
1        {1, 3}  {1}
2           {1}  NaN

Update Try this one groupby:

(df.groupby(['CustID','isMajor'])['MatchID']
   .apply(lambda x: x.value_counts().agg(list))
   .unstack('isMajor')
)

Also, groupby with two keys can be slow. In that case, you can just concatenate the keys and groupby on that:

keys = df['CustID'].astype(str) + '_' + df['isMajor'].astype(str)

(df.groupby(keys)['MatchID']
   .apply(lambda x: x.value_counts().agg(list))
)
Sign up to request clarification or add additional context in comments.

9 Comments

hi @Quang Hoang, your result is different from what I expected result. With CustID = 1 I want to get value of isMajor = 0 is [3,1] but in your solution I just get 3.
I forgot each column with each position in groupby will get a different result!!!
Hi @Quang Hoang, Can I ask you something? I saw you used to unstack to change index and column so I check your Output with info() and I see column names are 0 and 1, but when I get data from the column name, I get a KeyError, It's so weird.
Depending of isMajor type. They might be '0' not 0.
I'm testing your solution in ~ 20 mil data and it very slowly, I'm waiting for 20p and it has not completed. Do you have other ideas?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.