I want to pass each cell of a column in a dataframe to a function which then creates a new cell
I've looked here and here but these don't address my issue.
I'm using an obscure package so I'll simplify the method using the base packages to ask the question, hopefully the issue will be clear.
Method:
Load the data
import pandas as pd
import math
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
Pass the values of one column to a variable
lat = df['A']
Create a new column by applying the function to the variable
df['sol'] = df.apply(math.sqrt(lat))
This gives the error
TypeError: cannot convert the series to <type 'float'>
The error I'm getting using the pyeto package is actually
Traceback (most recent call last):
File "<ipython-input-10-b160408e9808>", line 1, in <module>
data['sol_dec'] = data['dayofyear'].apply(pyeto.sol_dec(data['dayofyear']), axis =1) # Solar declination
File "build\bdist.win-amd64\egg\pyeto\fao.py", line 580, in sol_dec
_check_doy(day_of_year)
File "build\bdist.win-amd64\egg\pyeto\_check.py", line 36, in check_doy
if not 1 <= doy <= 366:
File "C:\Users\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\generic.py", line 731, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().`
I think the issue is the same in both cases, the function will not apply to every cell in the dataframe column, and produces an error.
I want to be able to apply a function to each cell of a dataframe column (i.e. get the square root of each cell in column 'A'). Then store the result of this function as a variable (or another column in the dataframe i.e. have a 'sqrtA' column) , then apply a function to that variable (or column) and so on (i.e. have a new column which is 'sqrtA*100'.
I can't figure out how to do this, and would really appreciate guidance.
EDIT
@EdChum 's answer df['A'].apply(math.sqrt) or data['dayofyear'].apply(pyeto.sol_dec) (for the package function) helped a lot.
I'm now having issues with another function in the package which takes multiple arguments:
sha = pyeto.sunset_hour_angle(lat, sol_dec)
This function doesn't apply to a data-frame column, and I have lat and sol_dec stored as Series variables, but when I try to create a new column in the dataframe using these like so
data['sha'] = pyeto.sunset_hour_angle(lat, sol_dec) I get the same error as before...
Attempting to apply the function to multiple columns:
data['sha'] = data[['lat'],['sol_dec']].apply(pyeto.sunset_hour_angle)
gives the error:
Traceback (most recent call last):
File "<ipython-input-28-7b603745af93>", line 1, in <module>
data['sha'] = data[['lat'],['sol_dec']].apply(pyeto.sunset_hour_angle)
File "C:\Users\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.py", line 1969, in __getitem__
return self._getitem_column(key)
File "C:\Users\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.py", line 1976, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\pflattery\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\generic.py", line 1089, in _get_item_cache
res = cache.get(item)
TypeError: unhashable type: 'list'