0

I am trying to optimize a function returning the value (wage)of a variable given a condition (largest enrollment within MSA) for every year. I thought combining apply and lambda would be efficient, but my actual dataset is large (shape of 321681x272) making the computation extremely slow. Is there a faster way of going about this ? I think vectorizing the operations instead of iterating through df could be a solution, but I am unsure of the structure it would follow as an alternative to df.apply and lambda

df = pd.DataFrame({'year': [2000, 2000, 2001, 2001],
                    'msa': ['NYC-Newark', 'NYC-Newark', 'NYC-Newark', 'NYC-Newark'],
                  'leaname':['NYC School District', 'Newark School District', 'NYC School District', 'Newark School District'], 
                  'enroll': [100000,50000,110000,60000],
                   'wage': [5,2,7,3] })


def function1(x,y, var):
    '''
    Returns the selected variable's value for school district with largest enrollment in a given year
    '''

    t = df[(df['msa'] == x) & (df['year'] == y)]
    e = pd.DataFrame(t.groupby(['msa',var]).mean()['enroll'])
    return e.loc[e.groupby(level=[0])['enroll'].idxmax()].reset_index()[var]

df['main_city_wage'] = df.apply(lambda x: function1(x['msa'], x['year'], 'wage'), axis = 1)

Sample Output

   year         msa                 leaname  enroll  wage  main_wage

0  2000  NYC-Newark     NYC School District  100000     5          5
1  2000  NYC-Newark  Newark School District   50000     2          5
2  2001  NYC-Newark     NYC School District  110000     7          7
3  2001  NYC-Newark  Newark School District   60000     3          7

1 Answer 1

1

Something like

df['main_wage'] = df.set_index('wage').groupby(['year', 'msa'])['enroll'].transform('idxmax').values
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this cut in half the run time (42ms to 16ms)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.