0

I would like to create a method inside a class that gets a variable and a function as input arguments and return a new value. In below example the arbitrary function can be max, min, mean, or ...:

import pandas as pd
df = pd.DataFrame( {'col1': [1, 2], 'col2': [4, 6]})
df.max(axis=1), df.min(axis=1), df.mean(axis=1)  # sample of methods that I would like to pass

I would like to do similar through a method inside a class. My attempt so far that does not work:

class our_class():
    def __init__(self, df):
        self.df = df
    
    def arb_func(self, func):
        return self.df.func()

ob1 = our_class(df)
ob1.arb_func(max(axis=1))

Any suggestions appreciated.

PS: It is a toy problem here. My goal is to be able to get a data frame and do arbitrary number of statistical analysis on it later. I do not want to hardcode the statistical analysis and let it change later if needed.

6
  • 2
    In arb_func the func parameter is unused. You could call it whatever, it will have nothing to do with self.df.func(). What are you really trying to achieve here? What is the purpose of this class? Commented Jan 27, 2022 at 20:31
  • 1
    What did you plan func should be? A reference to an actual function? The name of the function as a string? Commented Jan 27, 2022 at 20:37
  • Why do you not just use ob1.df.max(axis=1)? Commented Jan 27, 2022 at 20:37
  • @mkrieger1 I would guess the OP is trying to get more proficient with generic code writing. Commented Jan 27, 2022 at 20:39
  • I added a note to my original post. Commented Jan 27, 2022 at 20:45

2 Answers 2

2

You could try this:

class our_class():
    def __init__(self, df):
        self.df = df
    
    def arb_func(self, func):
        return func(self.df)

You could then use it like this:

ob1 = our_class(df)
ob1.arb_func(lambda x: x.max(axis=1))
Sign up to request clarification or add additional context in comments.

3 Comments

Looks good and is actually a good example of Python dispatching in action. The question remains to all of us: why would the OP want to create an (otherwise) unnecessary layer that just brings additional complexity.
@deponovo Thanks for your good comments so far. I would like (later) to be able to receive all the functions that I need to apply on my data in the form of a list and be able to apply them on my data. I was not sure where to start so my toy question was my first attempt to do it. Probably there are way better approaches that I am not familiar with.
@jjramsey Thank you. This was a very nice solution.
1

New suggestion

As long as you make sure the function you pass requires a dataframe as its first argument, the problem becomes simple as (as already noted by @jjramsey):

class our_class():
    def __init__(self, df):
        self.df = df
    
    def arb_func(self, func):
        return func(self.df)

Virtually any method of pd.DataFrame, i.e. a method having a self as first input, for instance pd.DataFrame.max source, is directly compatible with this use. In this version you would have to be passing partial functions every time you would need some additional configurations in the form of arguments and keyword arguments. In your case this is the use of axis=1. A little modification to the above implementation can account for such situations:

class our_class():
    def __init__(self, df):
        self.df = df
    
    def arb_func(self, func, *args, **kwargs):
        return func(self.df, *args, **kwargs)

Now this implementation is that generic that you can pass your own functions as well as long as the first parameter is the dataframe. For instance, you would like to count how many apples you have with your own count_apples function as:

def count_apples(df, apples_column_name):
    return df[apples_column_name].eq('apple').sum()

Now making use of it as:

df = pd.DataFrame({"fruits_in_store": ["apple", "apple", "pear", "banana", "papaya"]})
ob1.arb_func(count_apples, "fruits_in_store")  # it is possible to pass this into the `apples_column_name` as an arg
ob1.arb_func(count_apples, apples_column_name="fruits_in_store")  # or you can be explicit

Original answer

I assume the OP is trying to generate some generic coding interface for educational purposes?

Here a suggestion (which in my opinion is actually making the usage way more complex than necessary, as many other users have already noted in their questions/comments):

from functools import partial
import pandas as pd

df = pd.DataFrame({"a": [1, 2, 3]})

class our_class():
    def __init__(self, df):
        self.df = df
    
    def arb_func(self, func: str, **kwargs):
        return partial(getattr(pd.DataFrame, func), **kwargs)(df)

ob1 = our_class(df)
print(ob1.arb_func("max", axis=1))
0    1
1    2
2    3
dtype: int64


print(ob1.arb_func("max", axis=0))
a    3
dtype: int64

2 Comments

@ deponovo, thanks for the new method. I do not seem to be able to get it working with import pandas as with df = pd.DataFrame( {'col1': [1, 2], 'col2': [4, 6]}) and ob1 = our_class(df), ob1.arb_func("max", axis=1).
@user101464 I guess you are now trying the New suggestion. For that version to work, you have to pass a reference to a function and not a function name. For instance: ob1.arb_func(pd.DataFrame.max, axis=1).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.