2

This is my current function:

def partnerTransaction(main_df, ptn_code, intent, retail_unique):

    if intent == 'Frequency':
        return main_df.query('csp_code == @retail_unique & partner_code == @ptn_code')['tx_amount'].count()

    elif intent == 'Total_value':
        return main_df.query('csp_code == @retail_unique & partner_code == @ptn_code')['tx_amount'].sum()

What it does is that it accepts a Pandas DataFrame (DF 1) and three search parameters. The retail_unique is a string that is from another dataframe (DF 2). Currently, I iterate over the rows of DF 2 using itertuples and call around 200 such functions and write to a 3rd DF, this is just an example. I have around 16000 rows in DF 2 so its very slow. What I want to do is vectorize this function. I want it to return a pandas series which has count of tx_amount per retail unique. So the series would be

34 # retail a
54 # retail b
23 # retail c

I would then map this series to the 3rd DF.

Is there any idea on how I might approach this?

EDIT: The first DF contains time based data with each retail appearing multiple times in one column and the tx_amount in another column, like so

Retail  tx_amount
retail_a  50
retail_b  100
retail_a  70
retail_c  20
retail_a  10

The second DF is arranged per retailer:

Retail
retail_a
retail_b
retail_c

2 Answers 2

2

First use merge with left join.

Then groupby by column tx_amount and aggregate by agg functions size and sum together or in second solution separately.

Last reset_index for convert Series to 2 column DataFrame:

If need both output together:

def partnerTransaction_together(df1, df2):
    df = pd.merge(df1, df2, on='Retail', how='left')
    d = {'size':'Frequency','sum':'Total_value'}
    return df.groupby('Retail')['tx_amount'].agg(['size','sum']).rename(columns=d)

print (partnerTransaction_together(df1, df2))
          Frequency  Total_value
Retail                          
retail_a          3          130
retail_b          1          100
retail_c          1           20

But if need use conditions:

def partnerTransaction(df1, df2, intent):
    df = pd.merge(df1, df2, on='Retail', how='left')
    g = df.groupby('Retail')['tx_amount']

    if intent == 'Frequency':
        return g.size().reset_index(name='Frequency')
    elif intent == 'Total_value':
        return g.sum().reset_index(name='Total_value')

print (partnerTransaction(df1, df2, 'Frequency'))
     Retail  Frequency
0  retail_a          3
1  retail_b          1
2  retail_c          1

print (partnerTransaction(df1, df2, 'Total_value'))
     Retail  Total_value
0  retail_a          130
1  retail_b          100
2  retail_c           20
Sign up to request clarification or add additional context in comments.

2 Comments

Could you explain how this works? I'm new to Pandas and I understand that you're grouping it by retail and accessing the tx_amount series from it. why are you resettting index?
@NeevParikh, jezrael's agg solution is idiomatic pandas at it's best.
1

If you want speed, here is a numpy solution using bincount

from collections import OrderedDict

f, u = pd.factorize(df1.Retail.values)

c = np.bincount(f)
s = np.bincount(f, df1.tx_amount.values).astype(df1.tx_amount.dtype)

pd.DataFrame(OrderedDict(Frequency=c, Total_value=s), u)

          Frequency  Total_value
retail_a          3          130
retail_b          1          100
retail_c          1           20

Timing

df1 = pd.DataFrame(dict(
        Retail=np.random.choice(list('abcdefghijklmnopqrstuvwxyz'), 10000),
        tx_amount=np.random.randint(1000, size=10000)
    ))


%%timeit
f, u = pd.factorize(df1.Retail.values)

c = np.bincount(f)
s = np.bincount(f, df1.tx_amount.values).astype(df1.tx_amount.dtype)

pd.DataFrame(OrderedDict(Frequency=c, Total_value=s), u)

1000 loops, best of 3: 607 µs per loop


%%timeit
d = {'size':'Frequency','sum':'Total_value'}
df1.groupby('Retail')['tx_amount'].agg(['size','sum']).rename(columns=d)

1000 loops, best of 3: 1.53 ms per loop

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.