4

I'm trying to take the data from "Mathscore" and convert the values into numerical values, all under "Mathscore."

strong =1 Weak = 0

I tried doing this via the function below using For loop but I can't get the code to run. Is the way I'm trying to assign data incorrect?

Thanks!

import pandas as pd

data = {'Id_Student' : [1,2,3,4,5,6,7,8,9,10],'Mathscore' :['Strong','Weak','Weak','Strong','Strong','Weak','Strong','Strong','Weak','Strong']}

df = pd.DataFrame(data)
df

# # Strong = 1 and Weak =0

##def tran_mathscore(x): if x == 'Strong': return 1 if x == 'Weak': return 0
##
##df['Trans_MathScore'] = df['Mathscore'].apply(tran_mathscore)
##df


##df.Mathscore[0]=["Weak"]

##print(df.columns)
##
##
##print(df.Mathscore)

def tran_mathscore():
    for i in df.Mathscore:
        if i == "Strong":
        df.Mathscore[i]= ['1']

    elif i == "Weak":
        df.Mathscore[i]= ['0']


tran_mathscore()

2 Answers 2

3

you can categorize your data:

In [23]: df['Mathscore'] = df.Mathscore.astype('category').cat.rename_categories(['1','0'])

In [24]: df
Out[24]:
   Id_Student Mathscore
0           1         1
1           2         0
2           3         0
3           4         1
4           5         1
5           6         0
6           7         1
7           8         1
8           9         0
9          10         1

In [25]: df.dtypes
Out[25]:
Id_Student       int64
Mathscore     category
dtype: object

or map it:

In [27]: df
Out[27]:
   Id_Student Mathscore
0           1    Strong
1           2      Weak
2           3      Weak
3           4    Strong
4           5    Strong
5           6      Weak
6           7    Strong
7           8    Strong
8           9      Weak
9          10    Strong

In [28]: df.Mathscore.map(d)
Out[28]:
0    1
1    0
2    0
3    1
4    1
5    0
6    1
7    1
8    0
9    1
Name: Mathscore, dtype: int64

In [29]: d
Out[29]: {'Strong': 1, 'Weak': 0}

In [30]: df['Mathscore'] = df.Mathscore.map(d)

In [31]: df
Out[31]:
   Id_Student  Mathscore
0           1          1
1           2          0
2           3          0
3           4          1
4           5          1
5           6          0
6           7          1
7           8          1
8           9          0
9          10          1

In [32]: df.dtypes
Out[32]:
Id_Student    int64
Mathscore     int64
dtype: object

PS i prefer the first option as categorical dtype uses much less memory

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. I don't anything about Categories nor Maps. I will have to do some reading and watch a video on it. For the first solution with categories, how does ['1', '0'] associate itself with strong and weak respectively. Is it taking the first two values under 'Mathscore' and then just repeating this input for the rest of the values under 'Mathscore'?
@moondra, yes, you understood it correctly. Here you'll find well documented description with lots of examples...
1

You could use:

df['Mathscore'] = df['Mathscore'].str.replace('Strong','1')
df['Mathscore'] = df['Mathscore'].str.replace('Weak','0')

Returns:

In [1]: df

Out[1]:

   Id_Student Mathscore
0           1         1
1           2         0
2           3         0
3           4         1
4           5         1
5           6         0
6           7         1
7           8         1
8           9         0
9          10         1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.