Python Pandas convert multiple string columns to specified integer values

Question

I have a dataframe with thousands of rows, some columns all have ratings like A,B,C,D. I am trying to do some machine learning and would like to give the ratings certain values, Like A=32,B=16,C=4,D=2. I have read some post on using factorize and labelEncoder

I got a simple method to work (while trying to explain the problem) from the link, but would like to know how to use other methods, I don't know how to tell those methods to use certain values, they seem just to put their own values to the data. The method below works if only a few columns need to be transformed.

import pandas as pd

df = pd.DataFrame({'Studentid':['12','40','36'],
               'history':['A','C','C'],
               'math':['B','C','D'],
               'biology':['A','C','B']})

print(df)

    Studentid history math biology
0        12       A    B       A
1        40       C    C       C
2        36       C    D       B


df['history1'] = df['history'].replace(to_replace=['A', 'B', 'C','D'], value=[32, 16, 4,2])
df['math1'] = df['math'].replace(to_replace=['A', 'B', 'C','D'], value=[32, 16, 4,2])
df['biology1'] = df['biology'].replace(to_replace=['A', 'B', 'C','D'], value=[32, 16, 4,2])

    Studentid history math biology  history1  math1  biology1
0        12       A    B       A        32     16        32
1        40       C    A       C         4     32         4
2        36       C    D       B         4      2        16

Thanks for accepting my answer. Please consider also upvoting my answer (see How to upvote on Stack Overflow?). — SeaBean
– SeaBean, Commented Jun 14, 2021 at 5:15
Have done. I just got enough 'reputation' to have my upvotes show up. Thanks, trying to implement your solution now. — newoptionz
– newoptionz, Commented Jun 14, 2021 at 12:28

SeaBean · Accepted Answer · 2021-06-13 14:25:29Z

1

If you need to transform a relatively large number of columns, probably you don't want to quote all the column names one by one in the program codes. You can do it this way:

Assuming the column Studentid is not going to be transformed:

grade_map = {'A': 32, 'B': 16, 'C': 4, 'D': 2}

df_transformed = df.drop('Studentid', axis=1).replace(grade_map).add_suffix('1')
df = df.join(df_transformed)

We exclude the column Studentid in the transformation by dropping the column first by .drop() and then use .replace() to translate the gradings. As such, we will never translate Studentid if in case the student id contains the characters same as the gradings. We add suffix 1 to all transformed columns by using .add_suffix()

After the transformation, we join the original dataframe with these transformed columns by using .join()

Result:

print(df)

  Studentid history math biology  history1  math1  biology1
0        12       A    B       A        32     16        32
1        40       C    C       C         4      4         4
2        36       C    D       B         4      2        16

answered Jun 13, 2021 at 14:25

SeaBean

23.4k3 gold badges16 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

newoptionz Over a year ago

Well, Python weirdness continues...the code works in 'jupyter lab', but not in Pycharm, both systems using the same venv. Problem is my code is mainly in Pycharm.....

SeaBean Over a year ago

@newoptionz The codes are just general Pandas codes that make no assumption on the IDE. What's the error message you get ?

newoptionz Over a year ago

Thanks, after a restart tonight, it was working in PyCharm.

Collectives™ on Stack Overflow

Python Pandas convert multiple string columns to specified integer values

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related