Error while replacing '?' with mean value in dataframe in Python

Question

I have a car dataset where I want to replace the '?' values in the column normalized-values to the mean of the remaining numerical values. The code I have used is:

mean = df["normalized-losses"].mean()
df["normalized-losses"].replace("?",mean)

However, this produces the error:

ValueError: could not convert string to float: '???164164?158?158?192192188188??121988111811811814814814814811014513713710110110111078106106858585107????145??104104104113113150150150150129115129115?115118?93939393?142???161161161161153153???125125125137128128128122103128128122103168106106128108108194194231161161??161161??16116116111911915415415474?186??????1501041501041501048383831021021021021028989858587877477819191919191919191168168168168134134134134134134656565656519719790?1221229494949494?256???1037410374103749595959595'

Can anyone help with the way in which I can convert the '?' values to the mean values. Also, this is the first time I am working with the Pandas package so if I have made any silly mistakes, please forgive me.

jezrael · Accepted Answer · 2018-11-29 09:52:58Z

1

Use to_numeric for convert non numeric values to NaNs and then fillna with mean:

vals = pd.to_numeric(df["normalized-losses"], errors='coerce')
df["normalized-losses"] = vals.fillna(vals.mean()) 
#data from jpp
print (df)
   normalized-losses
0                1.0
1                2.0
2                3.0
3                3.4
4                5.0
5                6.0
6                3.4

Details:

print (vals)
0    1.0
1    2.0
2    3.0
3    NaN
4    5.0
5    6.0
6    NaN
Name: normalized-losses, dtype: float64

print (vals.mean())
3.4

answered Nov 29, 2018 at 9:52

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sync11 · Accepted Answer · 2018-11-29 09:47:05Z

1

Use replace() followed byfillna():

df['normalized-losses'] = df['normalized-losses'].replace('?',np.NaN)
df['normalized-losses'].fillna(df['normalized-losses'].mean())

answered Nov 29, 2018 at 9:47

sync11

1,3102 gold badges10 silver badges23 bronze badges

Comments

jpp · Accepted Answer · 2018-11-29 09:53:27Z

1

The mean of a series of mixed types is not defined. Convert to numeric and then use replace:

df = pd.DataFrame({'A': [1, 2, 3, '?', 5, 6, '??']})

mean = pd.to_numeric(df['A'], errors='coerce').mean()
df['B'] = df['A'].replace('?', mean)

print(df)

    A    B
0   1    1
1   2    2
2   3    3
3   ?  3.4
4   5    5
5   6    6
6  ??   ??

If you need to replace all non-numeric values, then use fillna:

nums = pd.to_numeric(df['A'], errors='coerce')
df['B'] = nums.fillna(nums.mean())

print(df)

    A    B
0   1  1.0
1   2  2.0
2   3  3.0
3   ?  3.4
4   5  5.0
5   6  6.0
6  ??  3.4

answered Nov 29, 2018 at 9:53

jpp

166k37 gold badges301 silver badges363 bronze badges

Collectives™ on Stack Overflow

Error while replacing '?' with mean value in dataframe in Python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related