7

I'm working with a dictionary that contains a list of decimal.Decimal values as one of its fields:

import pandas as pd
from decimal import Decimal

data = {
    'Item': ['Apple', 'Banana', 'Orange'],
    'Price': [Decimal('1.25'), Decimal('0.75'), Decimal('2.00')],
    'Quantity': [10, 20, 15]
}

When I convert this dictionary into a Pandas DataFrame, the Price column (which contains Decimal objects) is inferred as object:

df = pd.DataFrame(data)
print(df.dtypes)

# Output:
# Item        object
# Price       object
# Quantity     int64
# dtype: object

I would like the Price column to be of type float64 instead.

I tried using pd.DataFrame.from_records(data, coerce_float=True), but it didn’t change the type of the Decimal values.

I’m aware I can convert the column using .astype(float), but in my actual use case, I may not know the column name in advance.

What’s the best way to ensure that all Decimal values are automatically converted to floats when creating the DataFrame, ideally without hardcoding column names?

2 Answers 2

5

You can identify the columns to use (for example using the first row), then convert to float:

convert = dict.fromkeys(df.columns[df.iloc[0].map(type).eq(Decimal)], float)
# {'Price': float}
out = df.astype(convert)

If you can't rely only on the first row, you could define a threshold of a number of Decimal values (e.g. at least 1, all, 90%) to convert the column:

# convert a column if all values are Decimal
convert = dict.fromkeys(df.columns[df.map(type).eq(Decimal).all()], float)

# convert if at least one value is Decimal
convert = dict.fromkeys(df.columns[df.map(type).eq(Decimal).any()], float)

# convert if more than 90% values are Decimal
convert = dict.fromkeys(df.columns[df.map(type).eq(Decimal).mean().gt(0.9)],
                        float)
out = df.astype(convert)

Output dtypes:

Item        string[python]
Price              float64
Quantity             Int64
dtype: object
Sign up to request clarification or add additional context in comments.

Comments

1

If you don't mind converting all the decimal values in the DF to float you can do

df = df.apply(pd.to_numeric, errors='ignore')

1 Comment

Note that errors='ignore' is deprecated, you could use df.apply(pd.to_numeric, errors='coerce').combine_first(df), although this might also convert unwanted values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.