Pandas converts Excel strings like ‘2004E205’ to scientific notation — how to prevent this

Question

How can I handle string values that contain patterns like xxxE205 (e.g., 2004E205), which are used as unique codes in my company? I explicitly read the column as a string in pandas, but values containing xxxExxx are still being interpreted as scientific notation when reading the Excel file, and they end up converted to something like 2004132141205.

How can I ensure that pandas reads these values as strings without automatically converting them when loading from an Excel file (not CSV)

[tag:I've tried as below. The column I'm facing the issue is bg_code]

import pandas as pd

#Assigning file path

2025_data = '/content/drive/MyDrive/Data Cleaning/2025-10 Data Checking/2025_data.xlsx'

# Reading All sheets

df_25_dict = pd.read_excel(2025_data, sheet_name=None),
    dtype={
        'bg_code': str,
        'tranx_date_year': str,
        'journal_number': str,
        'journal_line_number': str,
        'prj_code': str
    },
    parse_dates=['tranx_date', 'entry_date'])



# Iterate through the dictionary and print shape and columns for each sheet

for sheet_name, df in df_25_dict.items():
    print(f"Sheet: {sheet_name}")
    print(df.shape)
    print(df.columns)

Force Excel to treat the column as text before saving In Excel format the bg_code column as text before saving This will ensures Pandas read it exactly as a string. — Hasan Raza
– Hasan Raza, Commented Nov 20 at 15:14
You can also use converter parameter in pd.read_excel that tell pandas to convert the column to string after reading and overriding Excel interpretation — Hasan Raza
– Hasan Raza, Commented Nov 20 at 15:15
It sounds like the floating point numbers are already in that format inside the Excel sheet. If that's the case, you need to fix it there because there are no good ways to reverse-engineer your code from the number inside Python. I found that the best solution is to code the offender as ="2004E205" , although you can also just start the cell entry with a single quote '2004E205. — Chris Maurer
– Chris Maurer, Commented Nov 21 at 16:47
Using options in Pandas are not generally reliable solutions. Consider the two codes 1E2 and 10E1. They are both equivalent to the same floating point number, 100.0. Any reverse-engineering code you write will choose one or the other, with a 50% chance of being wrong. — Chris Maurer
– Chris Maurer, Commented Nov 21 at 16:55

ralf htp · Accepted Answer · 2025-11-20 19:02:13Z

1

Normally reading all cells as strings should do the trick

For sure the closed bracket ) behind sheet_name=None is wrong in you code


df_25_dict = pd.read_excel(2025_data, sheet_name=None),
    dtype={
        'bg_code': str,
        'tranx_date_year': str,
        'journal_number': str,
        'journal_line_number': str,
        'prj_code': str
    },
    parse_dates=['tranx_date', 'entry_date'])

Alternatively there is also the converters argument in Pandas read_excel() to control data type conversion better

https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html

Python pandas: how to specify data types when reading an Excel file?

answered Nov 20 at 19:02

ralf htp

9,4805 gold badges25 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ethan MK Nov 22 at 6:43

Thank you. I'll have a look at it & try

Collectives™ on Stack Overflow

Pandas converts Excel strings like ‘2004E205’ to scientific notation — how to prevent this

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related