4

I'm trying to turn a column of strings into integer identifiers...and I cannot find an elegant way of doing this in pandas (or python). In the following example, I transform "A", which is a column/variable of strings into numbers through a mapping, but it looks like a dirty hack to me

import pandas as pd                                                                             
import numpy as np

df = pd.DataFrame({'A': ['homer_simpson', 'mean_street', 'homer_simpson', 'bla_bla'], 'B': 4})

unique = df['A'].unique()
mapping = dict(zip(unique, np.arange(len(unique))))

new_df = df.replace({'A': mapping})

Is there a better, more direct, way of achieving this?

3 Answers 3

5

How about using factorize?

>>> labels, uniques = df.A.factorize()
>>> df.A = labels
>>> df
   A  B
0  0  4
1  1  4
2  0  4
3  2  4

http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.factorize.html

Sign up to request clarification or add additional context in comments.

1 Comment

Well done. Never heard of factorize.%timeit labels, uniques = df.A.factorize() 10000 loops, best of 3: 89 µs per loop %timeit df.A.map({val: n for n, val in enumerate(df['A'].unique())}) 1000 loops, best of 3: 363 µs per loop
1

Assuming you don't care much about what the integers are, simply that there's a consistent mapping, you could (1) use the Categorical codes or (2) rank the values:

>>> df["A_categ"] = pd.Categorical(df.A).codes
>>> df["A_rank"] = df["A"].rank("dense").astype(int)
>>> df
               A  B  A_categ  A_rank
0  homer_simpson  4        1       2
1    mean_street  4        2       3
2  homer_simpson  4        1       2
3        bla_bla  4        0       1

Comments

1

A simple map on a transposed dictionary should get you what you want. All the values in the dictionary are unique, so transposing it won't result in duplicate keys.

df['A'] = df.A.map({val: n for n, val in enumerate(df['A'].unique())})

>>> df
   A  B
0  0  4
1  1  4
2  0  4
3  2  4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.