Getting a function to work on every row of a dataframe (pandas)

Question

I want to pass each cell of a column in a dataframe to a function which then creates a new cell

I've looked here and here but these don't address my issue.

I'm using an obscure package so I'll simplify the method using the base packages to ask the question, hopefully the issue will be clear.

Method:

Load the data

import pandas as pd
import math

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

Pass the values of one column to a variable

lat = df['A']

Create a new column by applying the function to the variable

df['sol'] = df.apply(math.sqrt(lat))

This gives the error

TypeError: cannot convert the series to <type 'float'>

The error I'm getting using the pyeto package is actually

Traceback (most recent call last):

File "<ipython-input-10-b160408e9808>", line 1, in <module>
data['sol_dec'] = data['dayofyear'].apply(pyeto.sol_dec(data['dayofyear']), axis =1)            # Solar declination

File "build\bdist.win-amd64\egg\pyeto\fao.py", line 580, in sol_dec
_check_doy(day_of_year)

File "build\bdist.win-amd64\egg\pyeto\_check.py", line 36, in check_doy
if not 1 <= doy <= 366:

File "C:\Users\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\generic.py", line 731, in __nonzero__
.format(self.__class__.__name__))

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().`

I think the issue is the same in both cases, the function will not apply to every cell in the dataframe column, and produces an error.

I want to be able to apply a function to each cell of a dataframe column (i.e. get the square root of each cell in column 'A'). Then store the result of this function as a variable (or another column in the dataframe i.e. have a 'sqrtA' column) , then apply a function to that variable (or column) and so on (i.e. have a new column which is 'sqrtA*100'.

I can't figure out how to do this, and would really appreciate guidance.

EDIT

@EdChum 's answer df['A'].apply(math.sqrt) or data['dayofyear'].apply(pyeto.sol_dec) (for the package function) helped a lot.

I'm now having issues with another function in the package which takes multiple arguments:

sha = pyeto.sunset_hour_angle(lat, sol_dec)

This function doesn't apply to a data-frame column, and I have lat and sol_dec stored as Series variables, but when I try to create a new column in the dataframe using these like so

data['sha'] = pyeto.sunset_hour_angle(lat, sol_dec) I get the same error as before...

Attempting to apply the function to multiple columns:

data['sha'] = data[['lat'],['sol_dec']].apply(pyeto.sunset_hour_angle)

gives the error:

Traceback (most recent call last):

File "<ipython-input-28-7b603745af93>", line 1, in <module>
data['sha'] = data[['lat'],['sol_dec']].apply(pyeto.sunset_hour_angle)

File "C:\Users\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.py", line 1969, in __getitem__
return self._getitem_column(key)

File "C:\Users\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.py", line 1976, in _getitem_column
return self._get_item_cache(key)

File "C:\Users\pflattery\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\generic.py", line 1089, in _get_item_cache
res = cache.get(item)

TypeError: unhashable type: 'list'

EdChum · Accepted Answer · 2016-03-16 10:27:09Z

4

Use np.sqrt, as this understands arrays:

In [86]:
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df['sol'] = np.sqrt(df['A'])
df

Out[86]:
     A   B   C   D       sol
0   52  38   4  71  7.211103
1   59   4  36  15  7.681146
2   37  28  33  73  6.082763
3   58  26   4  96  7.615773
4   31  48  47  78  5.567764
5   43  58  45   4  6.557439
6   69  35  27  39  8.306624
..  ..  ..  ..  ..       ...
98  42   6  40  36  6.480741
99  22  44  11  24  4.690416

[100 rows x 5 columns]

To apply a function you can do:

In [87]:
import math
df['A'].apply(math.sqrt)

Out[87]:
0     7.211103
1     7.681146
2     6.082763
3     7.615773
4     5.567764
5     6.557439
6     8.306624
7     7.483315
8     7.071068
9     9.486833
        ...   
95    3.464102
96    6.855655
97    5.385165
98    6.480741
99    4.690416
Name: A, dtype: float64

What you tried was to pass a Series to math.sqrt but math.sqrt doesn't understand non-scalar values hence the error. Also you should avoid using apply when a vectorised method exists as this will be faster for a 10K row df:

In [90]:
%timeit df['A'].apply(math.sqrt)
%timeit np.sqrt(df['A'])

100 loops, best of 3: 2.15 ms per loop
10000 loops, best of 3: 99.7 µs per loop

Here you can see that numpy version is ~22x faster here

with respect to what you're trying to do, the following should work:

data['dayofyear'].apply(pyeto.sol_dec)

Edit

to pass multiple columns as args to a method:

data.apply(lambda x: pyeto.sunset_hour_angle(x['lat'],x['sol_dec']), axis=1)

edited Mar 16, 2016 at 10:27

answered Mar 16, 2016 at 9:40

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Ilja Everilä Over a year ago

Is the apply(math.sqrt) meant to be applymap, or am I missing something here?

EdChum Over a year ago

applymap operates on every element in a df apply will apply on each column or row for a df or each row for a Series

Ilja Everilä Over a year ago

Thanks, been trying to read the docs of both methods and not quite got it yet, because was constantly mixing up Series and DataFrames.

Pad Over a year ago

@EdChum Thanks for this! data['dayofyear'].apply(pyeto.sol_dec) worked. I've hit another issue with the next step, where a custom function from the package takes two arguments - I've edited the post to clarify, the solution you gave doesn't seem to work when the function takes multiple arguments

EdChum Over a year ago

Try data.apply(lambda x: pyeto.sunset_hour_angle(x['lat'],x['sol_dec']), axis=1)

|

Collectives™ on Stack Overflow

Getting a function to work on every row of a dataframe (pandas)

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related