How to determine whether a column/variable is numeric or not in Pandas/NumPy?

Question

Is there a better way to determine whether a variable in Pandas and/or NumPy is numeric or not ?

I have a self defined dictionary with dtypes as keys and numeric / not as values.

The comment above this one posted by Jaime, was simpler than the ones below and seems to have worked perfectly......thanks — hfrog713
– hfrog713, Commented May 3, 2018 at 17:36

danthelion · Accepted Answer · 2017-08-08 12:39:33Z

218

In pandas 0.20.2 you can do:

import pandas as pd
from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

df = pd.DataFrame({'A': ['a', 'b', 'c'], 'B': [1.0, 2.0, 3.0]})

is_string_dtype(df['A'])
>>>> True

is_numeric_dtype(df['B'])
>>>> True

edited Aug 8, 2017 at 12:39

answered Aug 8, 2017 at 12:23

danthelion

4,2652 gold badges17 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Manoj Govindan Over a year ago

It appears that is_numeric_dtype returns True for boolean type as well.

Théophile Pace Over a year ago

Yes @ManojGovindan, because booleans are integers in Python. You can apply operations such as multiplication to them, basically, a Bool is an integer that can be valued 0 or 1.

Peter Trcka Over a year ago

for decimal is_numeric_dtype returns False

Ronan Paixão Over a year ago

is_integer_dtype is also useful.

hintze · Accepted Answer · 2024-08-14 08:50:59Z

108

You can use np.issubdtype to check if the dtype is a sub dtype of np.number. Examples:

np.issubdtype(arr.dtype, np.number)  # where arr is a numpy array
np.issubdtype(df['X'].dtype, np.number)  # where df['X'] is a pandas Series

This works for numpy's dtypes but fails for pandas specific types like pd.Categorical as Thomas noted. If you are using categoricals is_numeric_dtype function from pandas is a better alternative than np.issubdtype.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [1.0, 2.0, 3.0], 
                   'C': [1j, 2j, 3j], 'D': ['a', 'b', 'c']})
df
Out: 
   A    B   C  D
0  1  1.0  1j  a
1  2  2.0  2j  b
2  3  3.0  3j  c

df.dtypes
Out: 
A         int64
B       float64
C    complex128
D        object
dtype: object

np.issubdtype(df['A'].dtype, np.number)
Out: True

np.issubdtype(df['B'].dtype, np.number)
Out: True

np.issubdtype(df['C'].dtype, np.number)
Out: True

np.issubdtype(df['D'].dtype, np.number)
Out: False

For multiple columns you can use np.vectorize:

is_number = np.vectorize(lambda x: np.issubdtype(x, np.number))
is_number(df.dtypes)
Out: array([ True,  True,  True, False], dtype=bool)

And for selection, pandas now has select_dtypes:

df.select_dtypes(include=[np.number])
Out: 
   A    B   C
0  1  1.0  1j
1  2  2.0  2j
2  3  3.0  3j

edited Aug 14, 2024 at 8:50

hintze

6505 silver badges20 bronze badges

answered Jul 4, 2016 at 13:17

user2285236

1 Comment

Thomas Over a year ago

This does not seem to work reliably with pandas DataFrames, since those might return categories unknown to numpy like "category". Numpy then throws "TypeError: data type not understood"

Wirawan Purwanto · Accepted Answer · 2020-02-17 13:21:50Z

61

Based on @jaime's answer in the comments, you need to check .dtype.kind for the column of interest. For example;

>>> import pandas as pd
>>> df = pd.DataFrame({'numeric': [1, 2, 3], 'not_numeric': ['A', 'B', 'C']})
>>> df['numeric'].dtype.kind in 'biufc'
>>> True
>>> df['not_numeric'].dtype.kind in 'biufc'
>>> False

NB The meaning of biufc: b bool, i int (signed), u unsigned int, f float, c complex. See https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.kind.html#numpy.dtype.kind

edited Feb 17, 2020 at 13:21

Wirawan Purwanto

4,1033 gold badges30 silver badges29 bronze badges

answered Jul 4, 2016 at 13:01

danodonovan

20.5k10 gold badges74 silver badges78 bronze badges

1 Comment

cbarrick Over a year ago

Here is the list of all dtype kinds [1]. Lowercase u is for unsigned integer; uppercase U is for unicode. [1]: docs.scipy.org/doc/numpy/reference/generated/…

xjcl · Accepted Answer · 2023-08-11 15:23:37Z

11

DataFrames have the select_dtypes method. This will return a subset of the DataFrame which only includes the "numeric" columns (columns of dtype int64/float64).

df.select_dtypes(include=['int64', 'float64'])

edited Aug 11, 2023 at 15:23

xjcl

15.7k8 gold badges87 silver badges108 bronze badges

answered Sep 18, 2019 at 18:51

farshad madani

1751 silver badge5 bronze badges

2 Comments

xjcl Over a year ago

Maybe one could even do include=np.number (import numpy as np)

Mark McDonald Over a year ago

How does this perform, compared with a read-only check like inspecting the columns dtypes?

Jeff · Accepted Answer · 2013-11-11 14:29:44Z

6

This is a pseudo-internal method to return only the numeric type data

In [27]: df = DataFrame(dict(A = np.arange(3), 
                             B = np.random.randn(3), 
                             C = ['foo','bar','bah'], 
                             D = Timestamp('20130101')))

In [28]: df
Out[28]: 
   A         B    C                   D
0  0 -0.667672  foo 2013-01-01 00:00:00
1  1  0.811300  bar 2013-01-01 00:00:00
2  2  2.020402  bah 2013-01-01 00:00:00

In [29]: df.dtypes
Out[29]: 
A             int64
B           float64
C            object
D    datetime64[ns]
dtype: object

In [30]: df._get_numeric_data()
Out[30]: 
   A         B
0  0 -0.667672
1  1  0.811300
2  2  2.020402

answered Nov 11, 2013 at 14:29

Jeff

130k21 gold badges223 silver badges189 bronze badges

2 Comments

user2808117 Over a year ago

Yes, I was trying to figure how do they do that. One would expect an internal IsNumeric function ran per column... but still didn't find it in the code

Jeff Over a year ago

You can apply this per column, but much easier just to check the dtype. in any event pandas operations exclude non-numeric when needed. what are you trying to do?

Punit S · Accepted Answer · 2017-09-26 10:07:52Z

3

How about just checking type for one of the values in the column? We've always had something like this:

isinstance(x, (int, long, float, complex))

When I try to check the datatypes for the columns in below dataframe, I get them as 'object' and not a numerical type I'm expecting:

df = pd.DataFrame(columns=('time', 'test1', 'test2'))
for i in range(20):
    df.loc[i] = [datetime.now() - timedelta(hours=i*1000),i*10,i*100]
df.dtypes

time     datetime64[ns]
test1            object
test2            object
dtype: object

When I do the following, it seems to give me accurate result:

isinstance(df['test1'][len(df['test1'])-1], (int, long, float, complex))

returns

True

answered Sep 26, 2017 at 10:07

Punit S

3,2571 gold badge24 silver badges26 bronze badges

Comments

Gokulakrishnan · Accepted Answer · 2020-07-30 11:26:53Z

3

You can check whether a given column contains numeric values or not using dtypes

numerical_features = [feature for feature in train_df.columns if train_df[feature].dtypes != 'O']

Note: "O" should be capital

answered Jul 30, 2020 at 11:26

Gokulakrishnan

1371 silver badge6 bronze badges

Comments

paulwasit · Accepted Answer · 2016-11-06 09:33:12Z

2

You can also try:

df_dtypes = np.array(df.dtypes)
df_numericDtypes= [x.kind in 'bifc' for x in df_dtypes]

It returns a list of booleans: True if numeric, False if not.

answered Nov 6, 2016 at 9:33

paulwasit

4362 silver badges12 bronze badges

Comments

Beta · Accepted Answer · 2018-06-04 11:07:10Z

2

Just to add to all other answers, one can also use df.info() to get whats the data type of each column.

answered Jun 4, 2018 at 11:07

Beta

1,7565 gold badges35 silver badges71 bronze badges

1 Comment

Rob Over a year ago

Or just df.dtypes

double0darbo · Accepted Answer · 2022-04-14 18:57:00Z

Assuming you want to keep your data in the same type, I found the following works similar to df._get_numeric_data():

df = pd.DataFrame({'A': ['a', 'b', 'c'], 'B': [1.0, 2.0, 3.0], 
                   'C': [4.0, 'x2', 6], 'D': [np.nan]*3})

test_dtype_df = df.loc[:, df.apply(lambda s: s.dtype.kind in 'biufc')]
test_dtype_df.shape == df._get_numeric_data().shape
Out[1]: True

However, if you want to test whether a series converts properly, you can use "ignore" :

df_ = df.copy().apply(pd.to_numeric, errors='ignore')
test_nmr_ignore = df_.loc[:, df_.apply(lambda s: s.dtype.kind in 'biufc')]

display(test_nmr_ignore)
test_nmr_ignore.shape == df._get_numeric_data().shape,\
test_nmr_ignore.shape == df_._get_numeric_data().shape,\
test_nmr_ignore.shape
     B   D
0  1.0 NaN
1  2.0 NaN
2  3.0 NaN
Out[2]: (True, True, (3, 2))

Finally, in the case where some data is mixed, you can use coerce with the pd.to_numeric function, and then drop columns that are filled completely with np.nan values.

df_ = df.copy().apply(pd.to_numeric, errors='coerce')
test_nmr_coerce = df_.dropna(axis=1, how='all')
display(test_nmr_coerce)
     B    C
0  1.0  4.0
1  2.0  NaN
2  3.0  6.0

You may have to determine which columns are np.nan values in the original data for accuracy. I merged the original np.nan columns back in with the converted data, df_:

nacols = [c for c in df.columns if c not in df.dropna(axis=1, how='all').columns]
display(pd.merge(test_nmr_coerce, 
                 df[nacols], 
                 right_index=True, left_index=True))
     B    C   D
0  1.0  4.0 NaN
1  2.0  NaN NaN
2  3.0  6.0 NaN

Capybara · Accepted Answer · 2024-07-03 14:27:24Z

1

If you want to check for numeric types in Pandas but exclude Booleans and complex numbers, you can use pandas.api.types.is_any_real_numeric_dtype() which was introduced in Pandas 2.0.0 (April 2023).

import pandas as pd
from pandas.api.types import is_any_real_numeric_dtype

df = pd.DataFrame(
    {
        "A": [1, 2, 3],
        "B": [1.0, 2.0, 3.0],
        "C": [1j, 2j, 3j],
        "D": ["a", "b", "c"],
        "E": [True, False, True],
    }
)
is_any_real_numeric_dtype(df["A"])
>>> True
is_any_real_numeric_dtype(df["B"])
>>> True
is_any_real_numeric_dtype(df["C"])
>>> False
is_any_real_numeric_dtype(df["D"])
>>> False
is_any_real_numeric_dtype(df["E"])
>>> False

edited Jul 3, 2024 at 14:27

answered Jul 3, 2024 at 6:30

Capybara

8571 gold badge11 silver badges22 bronze badges

Collectives™ on Stack Overflow

How to determine whether a column/variable is numeric or not in Pandas/NumPy?

11 Answers 11

4 Comments

1 Comment

1 Comment

2 Comments

2 Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

11 Answers 11

4 Comments

1 Comment

1 Comment

2 Comments

2 Comments

Comments

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related