0

I'm using the Google Sheets API to fetch data from a sheet where numbers use a European locale. The input Google Sheet looks like this:

product_price   impressions   clicks    ctr    avg_click_price   total_spent   orders    
2296,00         2184          117      5,36        12,63             1478,20       3         

However, when I fetch the data using worksheet.get_all_records() and process it in Python, the numbers are misinterpreted. For example, I get:

product_price  impressions   clicks  ctr    avg_click_price  total_spent  orders   
229600         2184          117     536    1263             147820       3      

Here’s the part of my code for processing the data:

sheet_data = worksheet.get_all_records()

# Processing integer columns
for col in integer_columns:
    if col in df.columns:
        logger.info(f"Processing integer column {col}...")
        df[col] = (
            df[col]
            .astype(str)  # Convert to string
            .str.replace("\u00A0", "")  # Remove non-breaking spaces
            .str.replace(" ", "")  # Remove regular spaces
            .str.extract(r"(\d+,.-)")  # Keep only digits
            .apply(locale.atof)  # Convert to float based on locale
        )

I suspect the issue is related to how numbers with commas and dots (e.g., 2296,00) are parsed. Google Sheets in pl_PL locale. The locale seems to be ignored, and the numbers are multiplied by 100 for float numbers.

How can I correctly parse and handle float and integer numbers in this format using Python, so the output matches the original values without creating two loops for integer and float numbers?

3
  • which 'European locale'. Also, show all code related to this not just the bit you think is relevant. Commented Nov 30, 2024 at 17:10
  • The .astype(str) bit will probably get values as displayed in cells, which would be text strings with commas instead of numeric data. You should get raw values instead. Commented Nov 30, 2024 at 17:17
  • @Anton How is this r"(\d+,.-)" supposed to keep only digits? What does the - do? Commented Nov 30, 2024 at 23:08

2 Answers 2

2

The default Values.get ValueRenderOption is FORMATTED_VALUE. Choose UNFORMATTED_VALUE in gspread to get raw values:

worksheet.get_all_records(value_render_option="UNFORMATTED_VALUE")
Sign up to request clarification or add additional context in comments.

Comments

1

I tried using babel parse_decimal to convert the numbers into decimal.It has various options to convert currency etc as well, which you can try as required.

Here is the sample example

from babel.numbers import parse_decimal
import pandas as pd

data = {
    'product_price': ['2296,00'],
    'impressions': [2184],
    'clicks': [117],
    'ctr': ['5,36'],
    'avg_click_price': ['12,63'],
    'total_spent': ['1478,20'],
    'orders': [3]
}

df = pd.DataFrame(data)
def convert_european_with_babel(df, locale='pl_PL'):
    for col in df.columns:
        if df[col].dtype == 'object': 
            df[col] = df[col].apply(lambda x: parse_decimal(x, locale=locale)) 
    return df

df_cleaned_babel = convert_european_with_babel(df)
print(df_cleaned_babel)

Output

  product_price  impressions  clicks   ctr avg_click_price total_spent  orders
0       2296.00         2184     117  5.36           12.63     1478.20       3
  

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.