9

I have a pandas dataframe that looks something like this:

    brand       description     former_price    discounted_price
0   A           icecream        1099.0          855.0   
1   A           cheese          469.0           375.0   
2   B           catfood         179.0           119.0   
3   C           NaN             699.0           399.0   
4   NaN         icecream        769.0           549.0
5   A           icecream        769.0           669.0   

I want to create a column that will assign a unique value for each brand & description combination. Note that either the brand or the description can be missing from the dataset (notified by NaN value). Also, note that if the brand and the description is the same (duplicated) I still want the unique value to be the same for the row.

The output should look like this:

    product_key   brand         description     former_price    discounted_price
0   1             A             icecream        1099.0          855.0   
1   2             A             cheese          469.0           375.0   
2   3             B             catfood         179.0           119.0   
3   4             C             NaN             699.0           399.0   
4   5             NaN           icecream        769.0           549.0
5   1             A             icecream        769.0           669.0   

The values in product_key can be anything, I just want them to be unique based on brand and description columns. Any help is immensely appreciated!

Thanks a lot!

2 Answers 2

11

You could try with pd.Series.factorize:

df.set_index(['brand','description']).index.factorize()[0]+1

Output:

0    1
1    2
2    3
3    4
4    5
5    1

So you could try this, to assign it to be the first column:

df.insert(loc=0, column='product_key', value=df.set_index(['brand','description']).index.factorize()[0]+1)

Output:

df
   product_key brand description  former_price  discounted_price
0            1     A    icecream        1099.0             855.0
1            2     A      cheese         469.0             375.0
2            3     B     catfood         179.0             119.0
3            4     C         NaN         699.0             399.0
4            5   NaN    icecream         769.0             549.0
5            1     A    icecream         769.0             669.0
Sign up to request clarification or add additional context in comments.

1 Comment

this looks great! I will tested it on my dataset and it works flawlessly
4

with groupby+ngroup:

(df.fillna({'brand':'','description':''})
   .groupby(['brand','description'],sort=False).ngroup()+1)

0    1
1    2
2    3
3    4
4    5
5    1

2 Comments

I tested this and it works! Just one question: why do you need to add +1 to it? is is because the ngroup count starts from 0?
@eagerstudent exactly for that purpose. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.