8

I want the following records (currently displaying as 3.200000e+18 but actually (hopefully) each a different long integer), created using pd.read_excel(), to be interpreted differently:

ipdb> self.after['class_parent_ref']
class_id
3200000000000515954    3.200000e+18
3200000000000515951             NaN
3200000000000515952             NaN
3200000000000515953             NaN
3200000000000515955    3.200000e+18
3200000000000515956    3.200000e+18
Name: class_parent_ref, dtype: float64

Currently, they seem to 'come out' as scientifically notated strings:

ipdb> self.after['class_parent_ref'].iloc[0]
3.2000000000005161e+18

Worse, though, it's not clear to me that the number has been read correctly from my .xlsx file:

ipdb> self.after['class_parent_ref'].iloc[0] -3.2e+18
516096.0

The number in Excel (the data source) is 3200000000000515952.

This is not about the display, which I know I can change here. It's about keeping the underlying data in the same form it was in when read (so that if/when I write it back to Excel, it'll look the same and so that if I use the data, it'll look like it did in Excel and not Xe+Y). I would definitely accept a string if I could count on it being a string representation of the correct number.

You may notice that the number I want to see is in fact (incidentally) one of the labels. Pandas correctly read those in as strings (perhaps because Excel treated them as strings?) unlike this number which I entered. (Actually though, even when I enter ="3200000000000515952" into the cell in question before redoing the read, I get the same result described above.)

How can I get 3200000000000515952 out of the dataframe? I'm wondering if pandas has a limitation with long integers, but the only thing I've found on it is 1) a little dated, and 2) doesn't look like the same thing I'm facing.

Thank you!

3
  • 5
    The problem is that you have floats, not integers. And the number you have too big to have such a precision as a float. The reason you end up with floats is because of the NaN values (NaN is not supported in integer columns, therefore it is cast to floats). Commented Oct 27, 2014 at 20:20
  • 1
    Thanks, @joris. Using the keep_default_na=False kwarg of read_excel() seems to have solved the problem. Feel free to answer accordingly and I'll 'check' it. Commented Oct 27, 2014 at 23:03
  • @HaPsantran you might just want to provide your own answer as joris seems not to have noticed your suggestion. Commented May 2, 2019 at 14:06

1 Answer 1

2

Convert your column values with NaN into 0 then typcast that column as integer to do so.

df[['class_parent_ref']] = df[['class_parent_ref']].fillna(value = 0)
df['class_parent_ref'] = df['class_parent_ref'].astype(int)

Or in reading your file, specify keep_default_na = False for pd.read_excel() and na_filter = False for pd.read_csv()

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.