4

I have a dafarame like the following:

df
    Name1   Name2
0   John    Jack
1   John    Albert
2   Jack    Eva
3   Albert  Sara
4   Eva     Sara

I want to assign to each a name a unique ID. So:

df
    Name1   Name2      ID1     ID2
0   John    Jack        0       1
1   John    Albert      0       2
2   Jack    Eva         1       3
3   Albert  Sara        2       5
4   Eva     Sara        3       5
2
  • Is it important which name gets which number? Commented Jan 7, 2019 at 11:40
  • No, it is not important. Just unique IDs between 0 and 1 Commented Jan 7, 2019 at 11:41

2 Answers 2

3

First flatten values by numpy.ravel and reshape by original df, use DataFrame constructor and create columns names, last join to original:

df1 = pd.DataFrame(pd.factorize(df.values.ravel())[0].reshape(df.shape))
df1.columns = ['ID{}'.format(x+1) for x in range(len(df1.columns))]
print (df1)
   ID1  ID2
0    0    1
1    0    2
2    1    3
3    2    4
4    3    4

df = df.join(df1)
print (df)
    Name1   Name2  ID1  ID2
0    John    Jack    0    1
1    John  Albert    0    2
2    Jack     Eva    1    3
3  Albert    Sara    2    4
4     Eva    Sara    3    4

Create MultiIndex Series by stack, create ids by factorize and for DataFrame unstack, then rename columns and add to original by join:

s = df.stack()
df = df.join(pd.Series(pd.factorize(s)[0], index=s.index)
               .unstack()
               .rename(columns=lambda x: x.replace('Name','ID')))
print (df)
    Name1   Name2  ID1  ID2
0    John    Jack    0    1
1    John  Albert    0    2
2    Jack     Eva    1    3
3  Albert    Sara    2    4
4     Eva    Sara    3    4

Similar alternative:

s = df.stack()
s[:] = pd.factorize(s)[0]
df = df.join(s.unstack().rename(columns=lambda x: x.replace('Name','ID')))
print (df)
    Name1   Name2  ID1  ID2
0    John    Jack    0    1
1    John  Albert    0    2
2    Jack     Eva    1    3
3  Albert    Sara    2    4
4     Eva    Sara    3    4
Sign up to request clarification or add additional context in comments.

2 Comments

Albert gets a 2 in your solution in both columns. OP specified 2 and 4.
@timgeb - yes, I notice it, I hope it is typo. It looks like it.
1

If it's not important which name gets which number, you can also consider

df.join(df.stack().astype('category').cat.codes.unstack() 
          .rename(columns=lambda c: c.replace('Name', 'ID')))                                                                  

which produces

    Name1   Name2  ID1  ID2
0    John    Jack    3    2
1    John  Albert    3    0
2    Jack     Eva    2    1
3  Albert    Sara    0    4
4     Eva    Sara    1    4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.