2

I want to combine columns in a dataframe depending on whether the data is numeric or not, for example:

import pandas as pd
import numpy as np

x = {'a':[1,2], 'b':['foo','bar'],'c':[np.pi,np.e]}
y = pd.DataFrame.from_dict(x)
y.apply(lambda x: x.sum() if x.dtype in (np.int64,np.float64) else x.min())

This gives the desired output, but it seems like there should be a nicer way to write the last line--is there a simple way to just check if the number is a numpy scalar type instead of checking if the dtype is in a specified list of numpy dtypes?

2 Answers 2

2

Rather than do a apply here, I would probably check each column for whether it's numeric with a simple list comprehension and separate these paths and then concat them back. This will be more efficient for larger frames.

In [11]: numeric = np.array([dtype in [np.int64, np.float64] for dtype in y.dtypes])

In [12]: numeric
Out[12]: array([True, False, True])

There may be an is_numeric_dtype function but I'm not sure where it is..

In [13]: y.iloc[:, numeric].sum()
Out[13]: 
a    3.000000
c    5.859874
dtype: float64

In [14]: y.iloc[:, ~numeric].min()
Out[14]: 
b    bar
dtype: object

Now you can concat these and potentially reindex:

In [15]: pd.concat([y.iloc[:, numeric].sum(), y.iloc[:, ~numeric].min()]).reindex(y.columns)
Out[15]: 
a           3
b         bar
c    5.859874
dtype: object
Sign up to request clarification or add additional context in comments.

2 Comments

df._get_numeric_data()
Thanks, both of you. That private method really does the trick.
2

You could use isscalar:

y.apply(lambda x: x.sum() if np.isscalar(x) else x.min())

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.