0

I have a dataframe with thousands of rows, some columns all have ratings like A,B,C,D. I am trying to do some machine learning and would like to give the ratings certain values, Like A=32,B=16,C=4,D=2. I have read some post on using factorize and labelEncoder

I got a simple method to work (while trying to explain the problem) from the link, but would like to know how to use other methods, I don't know how to tell those methods to use certain values, they seem just to put their own values to the data. The method below works if only a few columns need to be transformed.

import pandas as pd

df = pd.DataFrame({'Studentid':['12','40','36'],
               'history':['A','C','C'],
               'math':['B','C','D'],
               'biology':['A','C','B']})

print(df)

    Studentid history math biology
0        12       A    B       A
1        40       C    C       C
2        36       C    D       B


df['history1'] = df['history'].replace(to_replace=['A', 'B', 'C','D'], value=[32, 16, 4,2])
df['math1'] = df['math'].replace(to_replace=['A', 'B', 'C','D'], value=[32, 16, 4,2])
df['biology1'] = df['biology'].replace(to_replace=['A', 'B', 'C','D'], value=[32, 16, 4,2])

    Studentid history math biology  history1  math1  biology1
0        12       A    B       A        32     16        32
1        40       C    A       C         4     32         4
2        36       C    D       B         4      2        16
2
  • Thanks for accepting my answer. Please consider also upvoting my answer (see How to upvote on Stack Overflow?). Commented Jun 14, 2021 at 5:15
  • 1
    Have done. I just got enough 'reputation' to have my upvotes show up. Thanks, trying to implement your solution now. Commented Jun 14, 2021 at 12:28

1 Answer 1

1

If you need to transform a relatively large number of columns, probably you don't want to quote all the column names one by one in the program codes. You can do it this way:

Assuming the column Studentid is not going to be transformed:

grade_map = {'A': 32, 'B': 16, 'C': 4, 'D': 2}

df_transformed = df.drop('Studentid', axis=1).replace(grade_map).add_suffix('1')
df = df.join(df_transformed)

We exclude the column Studentid in the transformation by dropping the column first by .drop() and then use .replace() to translate the gradings. As such, we will never translate Studentid if in case the student id contains the characters same as the gradings. We add suffix 1 to all transformed columns by using .add_suffix()

After the transformation, we join the original dataframe with these transformed columns by using .join()

Result:

print(df)

  Studentid history math biology  history1  math1  biology1
0        12       A    B       A        32     16        32
1        40       C    C       C         4      4         4
2        36       C    D       B         4      2        16
Sign up to request clarification or add additional context in comments.

3 Comments

Well, Python weirdness continues...the code works in 'jupyter lab', but not in Pycharm, both systems using the same venv. Problem is my code is mainly in Pycharm.....
@newoptionz The codes are just general Pandas codes that make no assumption on the IDE. What's the error message you get ?
Thanks, after a restart tonight, it was working in PyCharm.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.