2

I have a car dataset where I want to replace the '?' values in the column normalized-values to the mean of the remaining numerical values. The code I have used is:

mean = df["normalized-losses"].mean()
df["normalized-losses"].replace("?",mean)

However, this produces the error:

ValueError: could not convert string to float: '???164164?158?158?192192188188??121988111811811814814814814811014513713710110110111078106106858585107????145??104104104113113150150150150129115129115?115118?93939393?142???161161161161153153???125125125137128128128122103128128122103168106106128108108194194231161161??161161??16116116111911915415415474?186??????1501041501041501048383831021021021021028989858587877477819191919191919191168168168168134134134134134134656565656519719790?1221229494949494?256???1037410374103749595959595'

Can anyone help with the way in which I can convert the '?' values to the mean values. Also, this is the first time I am working with the Pandas package so if I have made any silly mistakes, please forgive me.

3 Answers 3

1

Use to_numeric for convert non numeric values to NaNs and then fillna with mean:

vals = pd.to_numeric(df["normalized-losses"], errors='coerce')
df["normalized-losses"] = vals.fillna(vals.mean()) 
#data from jpp
print (df)
   normalized-losses
0                1.0
1                2.0
2                3.0
3                3.4
4                5.0
5                6.0
6                3.4

Details:

print (vals)
0    1.0
1    2.0
2    3.0
3    NaN
4    5.0
5    6.0
6    NaN
Name: normalized-losses, dtype: float64

print (vals.mean())
3.4
Sign up to request clarification or add additional context in comments.

Comments

1

Use replace() followed byfillna():

df['normalized-losses'] = df['normalized-losses'].replace('?',np.NaN)
df['normalized-losses'].fillna(df['normalized-losses'].mean())

Comments

1

The mean of a series of mixed types is not defined. Convert to numeric and then use replace:

df = pd.DataFrame({'A': [1, 2, 3, '?', 5, 6, '??']})

mean = pd.to_numeric(df['A'], errors='coerce').mean()
df['B'] = df['A'].replace('?', mean)

print(df)

    A    B
0   1    1
1   2    2
2   3    3
3   ?  3.4
4   5    5
5   6    6
6  ??   ??

If you need to replace all non-numeric values, then use fillna:

nums = pd.to_numeric(df['A'], errors='coerce')
df['B'] = nums.fillna(nums.mean())

print(df)

    A    B
0   1  1.0
1   2  2.0
2   3  3.0
3   ?  3.4
4   5  5.0
5   6  6.0
6  ??  3.4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.