I am having some issues applying several functions to my dataframe.
I have created a sample code to illustrate what I am trying to do. There might be a better way to do this specific function than the way I am doing it, but I am trying to get a general solution for my problem since I am using several functions, and not just how to do this specific thing the most efficient.
Basically, I have one sample dataframe that looks like this (df1):
Ticker Date High Volume
0 AAPL 20200501 1.5 150
1 AAPL 20200501 1.2 100
2 AAPL 20200501 1.3 150
3 AAPL 20200502 1.4 130
4 AAPL 20200502 1.2 170
5 AAPL 20200502 1.1 160
6 TSLA 20200501 2.5 250
7 TSLA 20200501 2.2 200
8 TSLA 20200501 2.3 250
9 TSLA 20200502 2.4 230
10 TSLA 20200502 2.2 270
11 TSLA 20200502 2.1 260
and one sample dataframe that looks like this (df2):
Ticker Date Price SumVol
0 AAPL 20200508 1.2 0
1 TSLA 20200508 2.2 0
the values in the column 'SumVol' in df2 should be filled with the sum of the values in the 'Volume' column from df1, up untill the first time the value in the 'Price'(df1) column is seen in df2, and the date in df1 matches the date from df2
desired output:
Ticker Date Price SumVol
0 AAPL 20200508 1.2 300
1 TSLA 20200508 2.2 500
for some reason I am unable to get this output, because I am probably doing something wrong in the line of code where I am trying to apply the function to the dataframe. I hope that someone here can help me out.
Full sample code including sample dataframes:
import pandas as pd
df1 = pd.DataFrame({'Ticker': ['AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'TSLA', 'TSLA', 'TSLA', 'TSLA', 'TSLA', 'TSLA'],
'Date': [20200501, 20200501, 20200501, 20200502, 20200502, 20200502, 20200501, 20200501, 20200501, 20200502, 20200502, 20200502],
'High': [1.5, 1.2, 1.3, 1.4, 1.2, 1.1, 2.5, 2.2, 2.3, 2.4, 2.2, 2.1],
'Volume': [150, 100, 150, 130, 170, 160, 250, 200, 250, 230, 270, 260]})
print(df1)
df2 = pd.DataFrame({'Ticker': ['AAPL', 'TSLA'],
'Date': [20200501, 20200502],
'Price': [1.4, 2.2],
'SumVol': [0,0]})
print(df2)
def VolSum(ticker, date, price):
df11 = pd.DataFrame(df1)
df11 = df11[df11['Ticker'] == ticker]
df11 = df11[df11['Date'] == date]
df11 = df11[df11['High'] < price]
df11 = pd.DataFrame(df11)
return df11.Volume.sum
df2['SumVol'].apply(VolSum(df2['Ticker'], df2['Date'], df2['Price']), inplace=True).reset_index(drop=True, inplace=True)
print(df2)