How can I handle string values that contain patterns like xxxE205 (e.g., 2004E205), which are used as unique codes in my company? I explicitly read the column as a string in pandas, but values containing xxxExxx are still being interpreted as scientific notation when reading the Excel file, and they end up converted to something like 2004132141205.
How can I ensure that pandas reads these values as strings without automatically converting them when loading from an Excel file (not CSV)
[tag:I've tried as below. The column I'm facing the issue is bg_code]
import pandas as pd
#Assigning file path
2025_data = '/content/drive/MyDrive/Data Cleaning/2025-10 Data Checking/2025_data.xlsx'
# Reading All sheets
df_25_dict = pd.read_excel(2025_data, sheet_name=None),
dtype={
'bg_code': str,
'tranx_date_year': str,
'journal_number': str,
'journal_line_number': str,
'prj_code': str
},
parse_dates=['tranx_date', 'entry_date'])
# Iterate through the dictionary and print shape and columns for each sheet
for sheet_name, df in df_25_dict.items():
print(f"Sheet: {sheet_name}")
print(df.shape)
print(df.columns)
="2004E205", although you can also just start the cell entry with a single quote'2004E205.1E2and10E1. They are both equivalent to the same floating point number, 100.0. Any reverse-engineering code you write will choose one or the other, with a 50% chance of being wrong.