I have a dataframe
df = pd.DataFrame(data=np.arange(10),columns=['v']).astype(float)
How to make sure that the numbers in v are whole numbers?
I am very concerned about rounding/truncation/floating point representation errors
I have a dataframe
df = pd.DataFrame(data=np.arange(10),columns=['v']).astype(float)
How to make sure that the numbers in v are whole numbers?
I am very concerned about rounding/truncation/floating point representation errors
astype(int)Tentatively convert your column to int and test with np.array_equal:
np.array_equal(df.v, df.v.astype(int))
True
float.is_integerYou can use this python function in conjunction with an apply:
df.v.apply(float.is_integer).all()
True
Or, using python's all in a generator comprehension, for space efficiency:
all(x.is_integer() for x in df.v)
True
allclose, the tolerance is very small to account for floating point inaccuracies. With is_integer, the function actually checks for whole numbers. The mechanism is slightly different but the end result is the same.allclose is incapable of determining that a number is an integer unless the tolerance is set to 0, at which point it becomes a test for equality. Furthermore, as stated in my comment to the question, testing for integer values does not accomplish the OP’s actual goal.df.v.apply: not sure if this works, after df.v it is a numpy ndarray, which does not have the method apply. Do you mean apply_along_axis?For completeness, Pandas v1.0+ offers the convert_dtypes() utility, that (among 3 other conversions) performs the requested operation for all dataframe-columns (or series) containing only integer numbers.
If you wanted to limit the conversion to a single column only, you could do the following:
>>> df.dtypes # inspect previous dtypes
v float64
>>> df["v"] = df["v"].convert_dtypes()
>>> df.dtypes # inspect converted dtypes
v Int64
If you want to check multiple float columns in your dataframe, you can do the following:
col_should_be_int = df.select_dtypes(include=['float']).applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = df.loc[:, float_to_int_cols].astype(int)
Keep in mind that a float column, containing all integers will not get selected if it has np.NaN values. To cast float columns with missing values to integer, you need to fill/remove missing values, for example, with median imputation:
float_cols = df.select_dtypes(include=['float'])
float_cols = float_cols.fillna(float_cols.median().round()) # median imputation
col_should_be_int = float_cols.applymap(float.is_integer).all()
float_to_int_cols = col_should_be_int[col_should_be_int].index
df.loc[:, float_to_int_cols] = float_cols[float_to_int_cols].astype(int)