1

I am trying to change a column's data type from type: object to type: int64 within a DataFrame using .map().

   df['one'] = df['one'].map(convert_to_int_with_error)

Here is my function:

def convert_to_int_with_error(x):
    if not x in ['', None, ' ']:
        try:
            return np.int64(x)
        except ValueError as e:
            print(e)
            return None
    else:
        return None

    if not type(x) == np.int64():
        print("Not int64")
        sys.exit()

This completes successfully. However, when I check the data type after completion, it reverts to type: float:

print("%s is a %s after converting" % (key, df['one'].dtype))
2
  • Where exactly did you put the if not type(x) == np.int64(): condition? Are you saying that convert_to_int_with_error never returns None? Commented Dec 30, 2016 at 14:47
  • 1
    For numerical containers, None will be regarded as NaN so as to keep it's float(numerical) dtype. You need to find a way to handle such missing values/empty strings so that it would result in np.int64 dtype. Commented Dec 30, 2016 at 14:53

1 Answer 1

2

I think problem is your problematic values are converted from None to NaN, so int is cast to float - see docs.

Instead map you can use to_numeric with parameter errors='coerce' for convert problematic values to NaN:

df['one'] = pd.to_numeric(df['one'], errors='coerce')
Sign up to request clarification or add additional context in comments.

2 Comments

I included the try and except to account for values that could not be converted to int64 properly though?
Unfortunately is not possible have dtype int with NaN or None values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.