189

Is there a better way to determine whether a variable in Pandas and/or NumPy is numeric or not ?

I have a self defined dictionary with dtypes as keys and numeric / not as values.

2
  • 28
    You could check dtype.kind in 'biufc'. Commented Nov 11, 2013 at 7:26
  • 1
    The comment above this one posted by Jaime, was simpler than the ones below and seems to have worked perfectly......thanks Commented May 3, 2018 at 17:36

11 Answers 11

218

In pandas 0.20.2 you can do:

import pandas as pd
from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

df = pd.DataFrame({'A': ['a', 'b', 'c'], 'B': [1.0, 2.0, 3.0]})

is_string_dtype(df['A'])
>>>> True

is_numeric_dtype(df['B'])
>>>> True
Sign up to request clarification or add additional context in comments.

4 Comments

It appears that is_numeric_dtype returns True for boolean type as well.
Yes @ManojGovindan, because booleans are integers in Python. You can apply operations such as multiplication to them, basically, a Bool is an integer that can be valued 0 or 1.
for decimal is_numeric_dtype returns False
is_integer_dtype is also useful.
108

You can use np.issubdtype to check if the dtype is a sub dtype of np.number. Examples:

np.issubdtype(arr.dtype, np.number)  # where arr is a numpy array
np.issubdtype(df['X'].dtype, np.number)  # where df['X'] is a pandas Series

This works for numpy's dtypes but fails for pandas specific types like pd.Categorical as Thomas noted. If you are using categoricals is_numeric_dtype function from pandas is a better alternative than np.issubdtype.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [1.0, 2.0, 3.0], 
                   'C': [1j, 2j, 3j], 'D': ['a', 'b', 'c']})
df
Out: 
   A    B   C  D
0  1  1.0  1j  a
1  2  2.0  2j  b
2  3  3.0  3j  c

df.dtypes
Out: 
A         int64
B       float64
C    complex128
D        object
dtype: object

np.issubdtype(df['A'].dtype, np.number)
Out: True

np.issubdtype(df['B'].dtype, np.number)
Out: True

np.issubdtype(df['C'].dtype, np.number)
Out: True

np.issubdtype(df['D'].dtype, np.number)
Out: False

For multiple columns you can use np.vectorize:

is_number = np.vectorize(lambda x: np.issubdtype(x, np.number))
is_number(df.dtypes)
Out: array([ True,  True,  True, False], dtype=bool)

And for selection, pandas now has select_dtypes:

df.select_dtypes(include=[np.number])
Out: 
   A    B   C
0  1  1.0  1j
1  2  2.0  2j
2  3  3.0  3j

1 Comment

This does not seem to work reliably with pandas DataFrames, since those might return categories unknown to numpy like "category". Numpy then throws "TypeError: data type not understood"
61

Based on @jaime's answer in the comments, you need to check .dtype.kind for the column of interest. For example;

>>> import pandas as pd
>>> df = pd.DataFrame({'numeric': [1, 2, 3], 'not_numeric': ['A', 'B', 'C']})
>>> df['numeric'].dtype.kind in 'biufc'
>>> True
>>> df['not_numeric'].dtype.kind in 'biufc'
>>> False

NB The meaning of biufc: b bool, i int (signed), u unsigned int, f float, c complex. See https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.kind.html#numpy.dtype.kind

1 Comment

Here is the list of all dtype kinds [1]. Lowercase u is for unsigned integer; uppercase U is for unicode. [1]: docs.scipy.org/doc/numpy/reference/generated/…
11

DataFrames have the select_dtypes method. This will return a subset of the DataFrame which only includes the "numeric" columns (columns of dtype int64/float64).

df.select_dtypes(include=['int64', 'float64'])

2 Comments

Maybe one could even do include=np.number (import numpy as np)
How does this perform, compared with a read-only check like inspecting the columns dtypes?
6

This is a pseudo-internal method to return only the numeric type data

In [27]: df = DataFrame(dict(A = np.arange(3), 
                             B = np.random.randn(3), 
                             C = ['foo','bar','bah'], 
                             D = Timestamp('20130101')))

In [28]: df
Out[28]: 
   A         B    C                   D
0  0 -0.667672  foo 2013-01-01 00:00:00
1  1  0.811300  bar 2013-01-01 00:00:00
2  2  2.020402  bah 2013-01-01 00:00:00

In [29]: df.dtypes
Out[29]: 
A             int64
B           float64
C            object
D    datetime64[ns]
dtype: object

In [30]: df._get_numeric_data()
Out[30]: 
   A         B
0  0 -0.667672
1  1  0.811300
2  2  2.020402

2 Comments

Yes, I was trying to figure how do they do that. One would expect an internal IsNumeric function ran per column... but still didn't find it in the code
You can apply this per column, but much easier just to check the dtype. in any event pandas operations exclude non-numeric when needed. what are you trying to do?
3

How about just checking type for one of the values in the column? We've always had something like this:

isinstance(x, (int, long, float, complex))

When I try to check the datatypes for the columns in below dataframe, I get them as 'object' and not a numerical type I'm expecting:

df = pd.DataFrame(columns=('time', 'test1', 'test2'))
for i in range(20):
    df.loc[i] = [datetime.now() - timedelta(hours=i*1000),i*10,i*100]
df.dtypes

time     datetime64[ns]
test1            object
test2            object
dtype: object

When I do the following, it seems to give me accurate result:

isinstance(df['test1'][len(df['test1'])-1], (int, long, float, complex))

returns

True

Comments

3

You can check whether a given column contains numeric values or not using dtypes

numerical_features = [feature for feature in train_df.columns if train_df[feature].dtypes != 'O']

Note: "O" should be capital

Comments

2

You can also try:

df_dtypes = np.array(df.dtypes)
df_numericDtypes= [x.kind in 'bifc' for x in df_dtypes]

It returns a list of booleans: True if numeric, False if not.

Comments

2

Just to add to all other answers, one can also use df.info() to get whats the data type of each column.

1 Comment

Or just df.dtypes
1

Assuming you want to keep your data in the same type, I found the following works similar to df._get_numeric_data():

df = pd.DataFrame({'A': ['a', 'b', 'c'], 'B': [1.0, 2.0, 3.0], 
                   'C': [4.0, 'x2', 6], 'D': [np.nan]*3})

test_dtype_df = df.loc[:, df.apply(lambda s: s.dtype.kind in 'biufc')]
test_dtype_df.shape == df._get_numeric_data().shape
Out[1]: True

However, if you want to test whether a series converts properly, you can use "ignore" :

df_ = df.copy().apply(pd.to_numeric, errors='ignore')
test_nmr_ignore = df_.loc[:, df_.apply(lambda s: s.dtype.kind in 'biufc')]

display(test_nmr_ignore)
test_nmr_ignore.shape == df._get_numeric_data().shape,\
test_nmr_ignore.shape == df_._get_numeric_data().shape,\
test_nmr_ignore.shape
     B   D
0  1.0 NaN
1  2.0 NaN
2  3.0 NaN
Out[2]: (True, True, (3, 2))

Finally, in the case where some data is mixed, you can use coerce with the pd.to_numeric function, and then drop columns that are filled completely with np.nan values.

df_ = df.copy().apply(pd.to_numeric, errors='coerce')
test_nmr_coerce = df_.dropna(axis=1, how='all')
display(test_nmr_coerce)
     B    C
0  1.0  4.0
1  2.0  NaN
2  3.0  6.0

You may have to determine which columns are np.nan values in the original data for accuracy. I merged the original np.nan columns back in with the converted data, df_:

nacols = [c for c in df.columns if c not in df.dropna(axis=1, how='all').columns]
display(pd.merge(test_nmr_coerce, 
                 df[nacols], 
                 right_index=True, left_index=True))
     B    C   D
0  1.0  4.0 NaN
1  2.0  NaN NaN
2  3.0  6.0 NaN

Comments

1

If you want to check for numeric types in Pandas but exclude Booleans and complex numbers, you can use pandas.api.types.is_any_real_numeric_dtype() which was introduced in Pandas 2.0.0 (April 2023).

import pandas as pd
from pandas.api.types import is_any_real_numeric_dtype

df = pd.DataFrame(
    {
        "A": [1, 2, 3],
        "B": [1.0, 2.0, 3.0],
        "C": [1j, 2j, 3j],
        "D": ["a", "b", "c"],
        "E": [True, False, True],
    }
)
is_any_real_numeric_dtype(df["A"])
>>> True
is_any_real_numeric_dtype(df["B"])
>>> True
is_any_real_numeric_dtype(df["C"])
>>> False
is_any_real_numeric_dtype(df["D"])
>>> False
is_any_real_numeric_dtype(df["E"])
>>> False

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.