1

I want to pass each cell of a column in a dataframe to a function which then creates a new cell

I've looked here and here but these don't address my issue.

I'm using an obscure package so I'll simplify the method using the base packages to ask the question, hopefully the issue will be clear.

Method:

Load the data

import pandas as pd
import math

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

Pass the values of one column to a variable

lat = df['A']

Create a new column by applying the function to the variable

df['sol'] = df.apply(math.sqrt(lat))

This gives the error

TypeError: cannot convert the series to <type 'float'>

The error I'm getting using the pyeto package is actually

Traceback (most recent call last):

File "<ipython-input-10-b160408e9808>", line 1, in <module>
data['sol_dec'] = data['dayofyear'].apply(pyeto.sol_dec(data['dayofyear']), axis =1)            # Solar declination

File "build\bdist.win-amd64\egg\pyeto\fao.py", line 580, in sol_dec
_check_doy(day_of_year)

File "build\bdist.win-amd64\egg\pyeto\_check.py", line 36, in check_doy
if not 1 <= doy <= 366:

File "C:\Users\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\generic.py", line 731, in __nonzero__
.format(self.__class__.__name__))

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().`

I think the issue is the same in both cases, the function will not apply to every cell in the dataframe column, and produces an error.

I want to be able to apply a function to each cell of a dataframe column (i.e. get the square root of each cell in column 'A'). Then store the result of this function as a variable (or another column in the dataframe i.e. have a 'sqrtA' column) , then apply a function to that variable (or column) and so on (i.e. have a new column which is 'sqrtA*100'.

I can't figure out how to do this, and would really appreciate guidance.

EDIT

@EdChum 's answer df['A'].apply(math.sqrt) or data['dayofyear'].apply(pyeto.sol_dec) (for the package function) helped a lot.

I'm now having issues with another function in the package which takes multiple arguments:

sha = pyeto.sunset_hour_angle(lat, sol_dec)

This function doesn't apply to a data-frame column, and I have lat and sol_dec stored as Series variables, but when I try to create a new column in the dataframe using these like so

data['sha'] = pyeto.sunset_hour_angle(lat, sol_dec) I get the same error as before...

Attempting to apply the function to multiple columns:

data['sha'] = data[['lat'],['sol_dec']].apply(pyeto.sunset_hour_angle)

gives the error:

Traceback (most recent call last):

File "<ipython-input-28-7b603745af93>", line 1, in <module>
data['sha'] = data[['lat'],['sol_dec']].apply(pyeto.sunset_hour_angle)

File "C:\Users\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.py", line 1969, in __getitem__
return self._getitem_column(key)

File "C:\Users\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.py", line 1976, in _getitem_column
return self._get_item_cache(key)

File "C:\Users\pflattery\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\generic.py", line 1089, in _get_item_cache
res = cache.get(item)

TypeError: unhashable type: 'list'

1 Answer 1

4

Use np.sqrt, as this understands arrays:

In [86]:
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df['sol'] = np.sqrt(df['A'])
df

Out[86]:
     A   B   C   D       sol
0   52  38   4  71  7.211103
1   59   4  36  15  7.681146
2   37  28  33  73  6.082763
3   58  26   4  96  7.615773
4   31  48  47  78  5.567764
5   43  58  45   4  6.557439
6   69  35  27  39  8.306624
..  ..  ..  ..  ..       ...
98  42   6  40  36  6.480741
99  22  44  11  24  4.690416

[100 rows x 5 columns]

To apply a function you can do:

In [87]:
import math
df['A'].apply(math.sqrt)

Out[87]:
0     7.211103
1     7.681146
2     6.082763
3     7.615773
4     5.567764
5     6.557439
6     8.306624
7     7.483315
8     7.071068
9     9.486833
        ...   
95    3.464102
96    6.855655
97    5.385165
98    6.480741
99    4.690416
Name: A, dtype: float64

What you tried was to pass a Series to math.sqrt but math.sqrt doesn't understand non-scalar values hence the error. Also you should avoid using apply when a vectorised method exists as this will be faster for a 10K row df:

In [90]:
%timeit df['A'].apply(math.sqrt)
%timeit np.sqrt(df['A'])

100 loops, best of 3: 2.15 ms per loop
10000 loops, best of 3: 99.7 µs per loop

Here you can see that numpy version is ~22x faster here

with respect to what you're trying to do, the following should work:

data['dayofyear'].apply(pyeto.sol_dec)

Edit

to pass multiple columns as args to a method:

data.apply(lambda x: pyeto.sunset_hour_angle(x['lat'],x['sol_dec']), axis=1)
Sign up to request clarification or add additional context in comments.

6 Comments

Is the apply(math.sqrt) meant to be applymap, or am I missing something here?
applymap operates on every element in a df apply will apply on each column or row for a df or each row for a Series
Thanks, been trying to read the docs of both methods and not quite got it yet, because was constantly mixing up Series and DataFrames.
@EdChum Thanks for this! data['dayofyear'].apply(pyeto.sol_dec) worked. I've hit another issue with the next step, where a custom function from the package takes two arguments - I've edited the post to clarify, the solution you gave doesn't seem to work when the function takes multiple arguments
Try data.apply(lambda x: pyeto.sunset_hour_angle(x['lat'],x['sol_dec']), axis=1)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.