0

I would like to create unique identifiers for values that are duplicates. Values that are duplicates are only 0's. The idea is to convert each zero to zero plus its position (0+1 for first row, 0+2 for second row etc). However the problem is the column also has other non duplicate values.

I have written this line of code to try and convert the zero values as stated but I am getting this error message

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('

Here is my code

seller_customer['customer_id'] = np.where(seller_customer['customer_id']==0, seller_customer['customer_id'] + seller_customer.groupby(['customer_id']).cumcount().replace('0',''))

Here is a sample of my data

{0: '7e468d618e16c6e1373fb2c4a522c969',
 1: '1c14a115bead8a332738c5d7675cca8c',
 2: '434dee65d973593dbb8461ba38202798',
 3: '4bbeac9d9a22f0628ba712b90862df28',
 4: '578d5098cbbe40771e1229fea98ccafd',
 5:  0,
 6:  0,
 7:  0}
1
  • Interesting. May be you need to write python function Commented Apr 1, 2019 at 18:08

2 Answers 2

2

If I understand correctly, you can just assign range values to those ids that are 0:

df.loc[df['id']==0, 'id'] = np.arange((df['id']==0).sum()) + 1

print(df)

Output:

                                 id
0  7e468d618e16c6e1373fb2c4a522c969
1  1c14a115bead8a332738c5d7675cca8c
2  434dee65d973593dbb8461ba38202798
3  4bbeac9d9a22f0628ba712b90862df28
4  578d5098cbbe40771e1229fea98ccafd
5                                 1
6                                 2
7                                 3

Or a shorter but slightly slower:

df.loc[df['id']==0, 'id'] = (df['id']==0).cumsum()
Sign up to request clarification or add additional context in comments.

Comments

0

You can do something like this:

    from pandas.util import hash_pandas_object
    import numpy as np
    df.x = np.where(df.x == 0, hash_pandas_object(df.x), df.x)
    df

Output:

                                      x
    0  7e468d618e16c6e1373fb2c4a522c969
    1  1c14a115bead8a332738c5d7675cca8c
    2  434dee65d973593dbb8461ba38202798
    3  4bbeac9d9a22f0628ba712b90862df28
    4  578d5098cbbe40771e1229fea98ccafd
    5                593769213749726025
    6              14559158595676751865
    7               4575103004772269825

They won't be sequential like the index but they will be unique (almost certainly, unless you encounter a hash collision)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.