46

I have a dataframe

df = pd.DataFrame(data=np.arange(10),columns=['v']).astype(float)

How to make sure that the numbers in v are whole numbers? I am very concerned about rounding/truncation/floating point representation errors

4
  • 1
    How will testing for integers allay concerns about floating-point errors? Do the values come from integers, and you are concerned they have changed? Or are they the results of calculations whose mathematical properties are such that exact results would be integers? Commented Mar 13, 2018 at 10:21
  • these values come from integers. However during processing often they are casted to float64 Commented Mar 13, 2018 at 12:18
  • 2
    The only errors that can occur in handling integers in floating point are rounding and overflow errors when converting from one format to another. When converting integer to floating-point, if the precision does not suffice to represent the value exactly, it will be rounded. However, the value it will be rounded to will be another integer, due to the nature of floating-point. Therefore, testing whether all values in an array are integers will provide no information about whether any rounding errors have occurred. Commented Mar 13, 2018 at 12:51
  • 1
    If the task is to ensure that values converted from integer to floating-point do not incur any rounding error, then it suffices if no integer exceeds the precision of the significand of the floating-point format. For example, IEEE 754 basic 64-bit binary has a 53-bit significand, so conversion of any integers up to 2^53 in magnitude will be not incur any rounding error. Commented Mar 13, 2018 at 12:54

5 Answers 5

54

Comparison with astype(int)

Tentatively convert your column to int and test with np.array_equal:

np.array_equal(df.v, df.v.astype(int))
True

float.is_integer

You can use this python function in conjunction with an apply:

df.v.apply(float.is_integer).all()
True

Or, using python's all in a generator comprehension, for space efficiency:

all(x.is_integer() for x in df.v)
True
Sign up to request clarification or add additional context in comments.

7 Comments

WHat is the tolerance of allclose compared to is_integer?are they a call to the same function?
@ErroriSalvo No, the mechanisms are slightly different. With allclose, the tolerance is very small to account for floating point inaccuracies. With is_integer, the function actually checks for whole numbers. The mechanism is slightly different but the end result is the same.
allclose is incapable of determining that a number is an integer unless the tolerance is set to 0, at which point it becomes a test for equality. Furthermore, as stated in my comment to the question, testing for integer values does not accomplish the OP’s actual goal.
@EricPostpischil okay, I've changed that to array_equal. By the way, this may be an XY problem, but it is still useful to know how to do this with numpy/pandas, so I've gone ahead and answered anyway. I appreciate the criticism (and the downvote).
df.v.apply: not sure if this works, after df.v it is a numpy ndarray, which does not have the method apply. Do you mean apply_along_axis?
|
22

Here's a simpler, and probably faster, approach:

(df[col] % 1  == 0).all()

To ignore nulls:

(df[col].fillna(-9999) % 1  == 0).all()

Comments

10

For completeness, Pandas v1.0+ offers the convert_dtypes() utility, that (among 3 other conversions) performs the requested operation for all dataframe-columns (or series) containing only integer numbers.

If you wanted to limit the conversion to a single column only, you could do the following:

>>> df.dtypes          # inspect previous dtypes
v                      float64

>>> df["v"] = df["v"].convert_dtypes()
>>> df.dtypes          # inspect converted dtypes
v                      Int64

Comments

7

If you want to check multiple float columns in your dataframe, you can do the following:

col_should_be_int = df.select_dtypes(include=['float']).applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = df.loc[:, float_to_int_cols].astype(int)

Keep in mind that a float column, containing all integers will not get selected if it has np.NaN values. To cast float columns with missing values to integer, you need to fill/remove missing values, for example, with median imputation:

float_cols = df.select_dtypes(include=['float'])
float_cols = float_cols.fillna(float_cols.median().round()) # median imputation
col_should_be_int = float_cols.applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = float_cols[float_to_int_cols].astype(int)

Comments

1

On 27 331 625 rows it works well. Time : 1.3sec

df['is_float'] = df[field_fact_qty]!=df[field_fact_qty].astype(int)

This way took Time : 4.9s

df[field_fact_qty].apply(lambda x : (x.is_integer()))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.