How to properly update a global variable in python using lambda

Question

I have a dataframe in which each row shows one transaction and items within that transactions. Here is how my dataframe looks like

itemList
A,B,C
B,F
G,A
...

I want to find the frequency of each item (how many times it appeared in the transactions. I have defined a dictionary and try to update its value as shown below

dict ={}
def update(itemList):
   #Update the value of each item in the dict

df.itemList.apply(lambda x: update(x))

As apply function gets executed for multiple row at the same time, multiple rows try to update the values in dict at the same time and it's causing an issue. How can I make sure multiple updated to dict does not cause any issue?

Why do you think multiple rows try .. at the same time? apply is just a for loop. — Quang Hoang
– Quang Hoang, Commented Mar 11, 2020 at 20:19
As per this article, please provide a reproducible sample. By this I mean: a sample dataset we can copy/paste, the output of what you are getting, and a sample of what you want to have as output. — Ukrainian-serge
– Ukrainian-serge, Commented Mar 11, 2020 at 20:22
You don't need a lambda expression anymore. df.itemList.apply(update). — chepner
– chepner, Commented Mar 11, 2020 at 20:28

ansev · Accepted Answer · 2020-03-11 20:38:47Z

1

I think you only need Series.str.get_dummies:

df['itemList'].str.get_dummies(',').sum().to_dict()
#{'A': 2, 'B': 2, 'C': 1, 'F': 1, 'G': 1}

If there are more columns use:

df.stack().str.get_dummies(',').sum().to_dict()

if you want to count for each row:

df['itemList'].str.get_dummies(',').to_dict('index')
#{0: {'A': 1, 'B': 1, 'C': 1, 'F': 0, 'G': 0},
# 1: {'A': 0, 'B': 1, 'C': 0, 'F': 1, 'G': 0},
# 2: {'A': 1, 'B': 0, 'C': 0, 'F': 0, 'G': 1}}

As @Quang Hoang said in the comments apply simply apply the function to each row / column using a loop

edited Mar 11, 2020 at 20:38

answered Mar 11, 2020 at 20:26

ansev

31k5 gold badges21 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Vaishali · Accepted Answer · 2020-03-11 20:55:10Z

You might be better off relying on native python here,

df = pd.DataFrame({'itemlist':['a,b,c', 'b,f', 'g,a', 'd,g,f,d,s,a,v', 'e,w,d,f,g,h', 's,d,f,e,r,t', 'e,d,f,g,r,r','s,d,f']})

Here is a solution using Counter,

df['itemlist'].str.replace(',','').apply(lambda x: Counter(x)).sum()

Some comparisons,

%timeit df['itemlist'].str.split(',', expand = True).stack().value_counts().to_dict()
2.64 ms ± 99.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df['itemlist'].str.get_dummies(',').sum().to_dict()
3.22 ms ± 68.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

from collections import Counter
%timeit df['itemlist'].str.replace(',','').apply(lambda x: Counter(x)).sum()
778 µs ± 12.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Collectives™ on Stack Overflow

How to properly update a global variable in python using lambda

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related