I'm using the Google Sheets API to fetch data from a sheet where numbers use a European locale. The input Google Sheet looks like this:
product_price impressions clicks ctr avg_click_price total_spent orders 2296,00 2184 117 5,36 12,63 1478,20 3
However, when I fetch the data using worksheet.get_all_records() and process it in Python, the numbers are misinterpreted. For example, I get:
product_price impressions clicks ctr avg_click_price total_spent orders 229600 2184 117 536 1263 147820 3
Here’s the part of my code for processing the data:
sheet_data = worksheet.get_all_records()
# Processing integer columns
for col in integer_columns:
if col in df.columns:
logger.info(f"Processing integer column {col}...")
df[col] = (
df[col]
.astype(str) # Convert to string
.str.replace("\u00A0", "") # Remove non-breaking spaces
.str.replace(" ", "") # Remove regular spaces
.str.extract(r"(\d+,.-)") # Keep only digits
.apply(locale.atof) # Convert to float based on locale
)
I suspect the issue is related to how numbers with commas and dots (e.g., 2296,00) are parsed. Google Sheets in pl_PL locale. The locale seems to be ignored, and the numbers are multiplied by 100 for float numbers.
How can I correctly parse and handle float and integer numbers in this format using Python, so the output matches the original values without creating two loops for integer and float numbers?
.astype(str)bit will probably get values as displayed in cells, which would be text strings with commas instead of numeric data. You should get raw values instead.r"(\d+,.-)"supposed to keep only digits? What does the-do?