1

How can I handle string values that contain patterns like xxxE205 (e.g., 2004E205), which are used as unique codes in my company? I explicitly read the column as a string in pandas, but values containing xxxExxx are still being interpreted as scientific notation when reading the Excel file, and they end up converted to something like 2004132141205.

How can I ensure that pandas reads these values as strings without automatically converting them when loading from an Excel file (not CSV)

[tag:I've tried as below. The column I'm facing the issue is bg_code]

import pandas as pd

#Assigning file path

2025_data = '/content/drive/MyDrive/Data Cleaning/2025-10 Data Checking/2025_data.xlsx'

# Reading All sheets

df_25_dict = pd.read_excel(2025_data, sheet_name=None),
    dtype={
        'bg_code': str,
        'tranx_date_year': str,
        'journal_number': str,
        'journal_line_number': str,
        'prj_code': str
    },
    parse_dates=['tranx_date', 'entry_date'])



# Iterate through the dictionary and print shape and columns for each sheet

for sheet_name, df in df_25_dict.items():
    print(f"Sheet: {sheet_name}")
    print(df.shape)
    print(df.columns)
9
  • 1
    Force Excel to treat the column as text before saving In Excel format the bg_code column as text before saving This will ensures Pandas read it exactly as a string. Commented Nov 20 at 15:14
  • 3
    You can also use converter parameter in pd.read_excel that tell pandas to convert the column to string after reading and overriding Excel interpretation Commented Nov 20 at 15:15
  • use converters as @HasanRaza pointed out Commented Nov 20 at 15:28
  • 2
    It sounds like the floating point numbers are already in that format inside the Excel sheet. If that's the case, you need to fix it there because there are no good ways to reverse-engineer your code from the number inside Python. I found that the best solution is to code the offender as ="2004E205" , although you can also just start the cell entry with a single quote '2004E205. Commented Nov 21 at 16:47
  • 1
    Using options in Pandas are not generally reliable solutions. Consider the two codes 1E2 and 10E1. They are both equivalent to the same floating point number, 100.0. Any reverse-engineering code you write will choose one or the other, with a 50% chance of being wrong. Commented Nov 21 at 16:55

1 Answer 1

1

Normally reading all cells as strings should do the trick

For sure the closed bracket ) behind sheet_name=None is wrong in you code


df_25_dict = pd.read_excel(2025_data, sheet_name=None),
    dtype={
        'bg_code': str,
        'tranx_date_year': str,
        'journal_number': str,
        'journal_line_number': str,
        'prj_code': str
    },
    parse_dates=['tranx_date', 'entry_date'])

Alternatively there is also the converters argument in Pandas read_excel() to control data type conversion better

https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html

Python pandas: how to specify data types when reading an Excel file?

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. I'll have a look at it & try

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.