Pandas convert float in scientific notation to string

Question

I used read_csv() to load a dataset that looks like this

userid
NaN
1.091178e+11
1.137856e+11

I want to convert the user ids to string. One solution is to add keep_default_na=False to read_csv(), which is suggested by this SO: Converting long integers to strings in pandas (to avoid scientific notation)

Let's say I don't want to use keep_default_na=False. Is there any way to convert the user id column to str.

I tried df.userid.astype(str) and I got 1.091178e+11 back. I was expecting the result in the expanded form not scientific form.

What should I do？

Is possible use parameter dtype={'userid':str} and it works for you? — jezrael
– jezrael, Commented Dec 15, 2016 at 6:51
You could apply a string format df.userid.apply(lambda x: '{:.0f}'.format(x)). — Shashank Agarwal
– Shashank Agarwal, Commented Dec 15, 2016 at 6:56

MarredCheese · Accepted Answer · 2019-06-06 03:18:03Z

7

You can use map or apply, as mentioned in this comment:

print (df.userid.map(lambda x: '{:.0f}'.format(x)))
0             nan
1    109117800000
2    113785600000
Name: userid, dtype: object

df.userid = df.userid.map(lambda x: '{:.0f}'.format(x))
print (df)
         userid
0           nan
1  109117800000
2  113785600000

I wondered whether map would be faster, but it is the same:

#[300000 rows x 1 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
#print (df)

In [40]: %timeit (df.userid.map(lambda x: '{:.0f}'.format(x)))
1 loop, best of 3: 211 ms per loop

In [41]: %timeit (df.userid.apply(lambda x: '{:.0f}'.format(x)))
1 loop, best of 3: 210 ms per loop

Another solution is to_string, but it is slow:

print(df.userid.to_string(float_format='{:.0f}'.format))
0            nan
1   109117800000
2   113785600000

In [41]: (df.userid.to_string(float_format='{:.0f}'.format))
1 loop, best of 3: 2.52 s per loop

edited Jun 6, 2019 at 3:18

MarredCheese

21.3k12 gold badges109 silver badges105 bronze badges

answered Dec 15, 2016 at 7:04

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

liang Over a year ago

though you might want to replace 'nan' back to pd.NA with replace after the map.

Douglas Navarro · Accepted Answer · 2018-12-15 20:59:35Z

4

I just stumbled upon this problem after reading a dataframe from a json file using the read_json method and unfortunately it does not have a keep_default_na parameter.

The solution was to convert the long floats to np.int64 before converting them to str.

In [53]: tweet_id_sample = tweets.iloc[0]['id']
         tweet_id_sample
Out[53]: 8.924206435553362e+17

In [54]: tweet_id_sample.astype(str)
Out[54]: '8.924206435553362e+17'

In [55]: tweet_id_sample.astype(np.int64).astype(str)
Out[55]: '892420643555336192'

In [56]: # This overflows
         tweet_id_sample.astype(int)
Out[56]: -2147483648

answered Dec 15, 2018 at 20:59

Douglas Navarro

413 bronze badges

Collectives™ on Stack Overflow

Pandas convert float in scientific notation to string

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related