2

I am having some issues applying several functions to my dataframe.

I have created a sample code to illustrate what I am trying to do. There might be a better way to do this specific function than the way I am doing it, but I am trying to get a general solution for my problem since I am using several functions, and not just how to do this specific thing the most efficient.

Basically, I have one sample dataframe that looks like this (df1):

   Ticker      Date  High  Volume
0    AAPL  20200501   1.5     150
1    AAPL  20200501   1.2     100
2    AAPL  20200501   1.3     150
3    AAPL  20200502   1.4     130
4    AAPL  20200502   1.2     170
5    AAPL  20200502   1.1     160
6    TSLA  20200501   2.5     250
7    TSLA  20200501   2.2     200
8    TSLA  20200501   2.3     250
9    TSLA  20200502   2.4     230
10   TSLA  20200502   2.2     270
11   TSLA  20200502   2.1     260

and one sample dataframe that looks like this (df2):

  Ticker      Date  Price  SumVol
0   AAPL  20200508    1.2       0
1   TSLA  20200508    2.2       0

the values in the column 'SumVol' in df2 should be filled with the sum of the values in the 'Volume' column from df1, up untill the first time the value in the 'Price'(df1) column is seen in df2, and the date in df1 matches the date from df2

desired output:

    Ticker      Date  Price  SumVol
0   AAPL  20200508    1.2    300
1   TSLA  20200508    2.2    500

for some reason I am unable to get this output, because I am probably doing something wrong in the line of code where I am trying to apply the function to the dataframe. I hope that someone here can help me out.

Full sample code including sample dataframes:

import pandas as pd

df1 = pd.DataFrame({'Ticker': ['AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'AAPL', 'TSLA', 'TSLA', 'TSLA', 'TSLA', 'TSLA', 'TSLA'],
                'Date': [20200501, 20200501, 20200501, 20200502, 20200502, 20200502, 20200501, 20200501, 20200501, 20200502, 20200502, 20200502],
               'High': [1.5, 1.2, 1.3, 1.4, 1.2, 1.1, 2.5, 2.2, 2.3, 2.4, 2.2, 2.1],
                'Volume': [150, 100, 150, 130, 170, 160, 250, 200, 250, 230, 270, 260]})
print(df1)

df2 = pd.DataFrame({'Ticker': ['AAPL', 'TSLA'],
               'Date': [20200501, 20200502],
                'Price': [1.4, 2.2],
                'SumVol': [0,0]})

print(df2)

def VolSum(ticker, date, price):
    df11 = pd.DataFrame(df1)
    df11 = df11[df11['Ticker'] == ticker]
    df11 = df11[df11['Date'] == date]
    df11 = df11[df11['High'] < price]

    df11 = pd.DataFrame(df11)
    return df11.Volume.sum

df2['SumVol'].apply(VolSum(df2['Ticker'], df2['Date'], df2['Price']), inplace=True).reset_index(drop=True, inplace=True)
print(df2)

1 Answer 1

2

The first reason of your failure is that your function ends with return df11.Volume.sum (without parentheses), so you return just sum function, not the result of its execution.

Another reason is that you can apply a function to e.g. each row of a Dataframe, but you must pass axis=1 parameter. But then:

  • the function to be applied should have one parameter - the current row,
  • its result can be substituted under a desired column.

And the third reason of failure is that df2 contains e.g. dates not present in df1, so you are not likely to find any matching rows.

How to get the expected result - Method 1

First, df2 must contain values that are likely to be matched with df1. I defined df2 as:

  Ticker      Date  Price  SumVol
0   AAPL  20200501    1.4       0
1   TSLA  20200502    2.3       0

Then I changed your function to:

def VolSum(row):
    df11 = pd.DataFrame(df1)
    df11 = df11[df11['Ticker'] == row.Ticker]
    df11 = df11[df11['Date'] == row.Date]
    df11 = df11[df11['High'] < row.Price]
    return df11.Volume.sum()

And finally I generated the result as:

df2['SumVol'] = df2.apply(VolSum, axis=1)

The result is:

  Ticker      Date  Price  SumVol
0   AAPL  20200501    1.4     250
1   TSLA  20200502    2.3     530

How to get the expected result - Method 2

But a more concise and elegant method is to define the summing function as:

def VolSum2(row):
    return df1.query('Ticker == @row.Ticker and '
        'Date == @row.Date and High < @row.Price').Volume.sum()

And apply it just the same way:

df2['SumVol'] = df2.apply(VolSum2, axis=1)

The result is of course the same.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your answer, verry well explained and it works like a charm. Sorry for the mistakes in the sample df though, ill update my origonal post for future review

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.