0

I have two functions that give the same result, one vectorized and one with a "for" loop. Suprisingly the for loop is faster than the vectorized version. Any idea why is it so ?

def loop_for(df):
    gpd    = df.groupby([pd.TimeGrouper(freq="QS-JAN"), 'CD_PDP'])
    result = []
    for (quarter, unite), data in gpd:
        nb_MAT_RH   = data["MAT_RH"  ].nunique()
        nb_MAT_RHPI = data["MAT_RHPI"].nunique()
        result.append({"CD_PDP": unite, "MOIS_COMPTABLE": quarter, "nb_mat_rh" : nb_MAT_RH, "nb_MAT_RHPI" : nb_MAT_RHPI})

    return pd.DataFrame(result)


def vectorisation(df):
    b = df.groupby([pd.TimeGrouper(freq="QS-JAN"), 'CD_PDP']).apply(lambda x: pd.Series( {"nb_mat_rh"   : x["MAT_RH"  ].nunique(),
                                                                                          "nb_MAT_RHPI" : x["MAT_RHPI"].nunique()}))
    return b.reset_index()

when testing :

import timeit
print "loop"
print timeit.timeit(stmt="loop_for(df)",number= 2, setup="from __main__ import loop_for; from __main__ import df")
print "vector"
print timeit.timeit(stmt="vectorisation(df)",number= 2, setup="from __main__ import vectorisation; from __main__ import df")

it gives :

loop
6.83789801598
vector
7.13991713524
3
  • 4
    Using .apply(lambda ... ) is not really vectorization; it is essentially the same as running a for loop over the data. Commented Jul 16, 2016 at 20:26
  • Also, we have no idea what df you're passing to your functions. But more importantly... what @ajcr said. Commented Jul 16, 2016 at 21:04
  • good to know :) Would vectorization be possible here ? And if so how ? Commented Jul 16, 2016 at 21:54

1 Answer 1

1

Doing .nunique() on a SeriesGroupBy object does take advantage of vectorization:

grouped = df.groupby([pd.TimeGrouper(freq="QS-JAN"), 'CD_PDP'])

b = df.groupby('a').agg({'MAT_RH': 'nunique', 'MAT_RHPI': 'nunique'})
b = b.rename(columns={'MAT_RH': 'nb_mat_rh', 'MAT_RHPI': 'nb_MAT_RHPI'})

But without even a sample of your original df it is impossible to run any benchmarks.

Sign up to request clarification or add additional context in comments.

2 Comments

@romainjouin, please consider accepting an answer it was helpful
@romainjouin if this or any answer has solved your question please consider accepting it by clicking the check-mark. This indicates to the wider community that you've found a solution and gives some reputation to both the answerer and yourself. There is no obligation to do this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.