18

So I wanted to create a module for my own projects and wanted to use methods. For example I wanted to do:

from mymodule import *
df = pd.DataFrame(np.random.randn(4,4))
df.mymethod()

Thing is it seems I can't use .myfunc() since I think I can only use methods for the classes I've created. A work around is making mymethod a function and making it use pandas.Dataframes as a variable:

myfunc(df)

I don't really want to do this, is there anyway to implement the first one?

4
  • 1
    Why don't you want to make it a function? Otherwise you'll have to subclass or patch the data frame. Commented Apr 19, 2017 at 19:04
  • Depending on what the function does you may be able to use apply. For example df.apply(myfunc) I realize this doesn't create a new method, but perhaps it gets you what you need, at the very least you can do method chaining this way ` df.apply(myfunc).apply(myotherfunc)... Commented Apr 19, 2017 at 19:10
  • What about just using the apply method? How complex is your method? Commented Apr 19, 2017 at 19:10
  • 1
    As noted in an answer below, the pandas documentation provides a "way to extend pandas objects without subclassing them" using the decorator pandas.api.extensions.register_dataframe_accessor(). There is a long list of extensions in the pandas ecosystem page. Commented Nov 16, 2021 at 14:07

4 Answers 4

41

Nice solution can be found in ffn package. What authors do:

from pandas.core.base import PandasObject
def your_fun(df):
    ...
PandasObject.your_fun = your_fun

After that your manual function "your_fun" becomes a method of pandas.DataFrame object and you can do something like

df.your_fun()

This method will be able to work with both DataFrame and Series objects

Sign up to request clarification or add additional context in comments.

4 Comments

Does this technique or way of coding has a name? I am trying to understand how/why it works and not sure I grasp it.
@monkeyintern There is "monkey-patching" name for it in outdated docs pandas.pydata.org/pandas-docs/version/0.15/… , however I found not pandas specific, but general way to add methods here medium.com/@mgarod/…
After experimenting, this seems to add this under all Pandas object, including Series (columns), maybe not what you want, as "self" - here "df" is then not a dataframe, but a Series... You would then have to stop the user from using a method in a place you have put it. The Pandas API now lets you extend in other ways. pandas.pydata.org/docs/development/extending.html Take a look at pandichef's answer.
Note that this can also be done with an anonymous function, (e.g. pd.Series.vc = lambda x: x.value_counts(dropna=False))
13

This topic is well documented as of Nov 2019: Extending pandas

Note that the most obvious technique - Ivan Mishalkin's monkey patching - was actually removed at some point in the official documentation... probably for good reason.

Monkey patching works fine for small projects, but there is a serious drawback for a large scale project: IDEs like Pycharm can't introspect the patched-in methods. So if one right clicks "Go to declaration", Pycharm simply says "cannot find declaration to go to". It gets old fast if you're an IDE junkie.

I confirmed that Pycharm CAN introspect both the "custom accessors" and "subclassing" methods discussed in the official documentation.

1 Comment

This is now the best answer!
12

If you really need to add a method to a pandas.DataFrame you can inherit from it. Something like:

mymodule:

import pandas as pd

class MyDataFrame(pd.DataFrame):
    def mymethod(self):
        """Do my stuff"""

Use mymodule:

from mymodule import *
df = MyDataFrame(np.random.randn(4,4))
df.mymethod()

To preserve your custom dataframe class:

pandas routinely returns new dataframes when performing operations on dataframes. So to preserve your dataframe class, you need to have pandas return your class when performing operations on an instance of your class. That can be done by providing a _constructor property like:

class MyDataFrame(pd.DataFrame):

    @property
    def _constructor(self):
        return MyDataFrame

    def mymethod(self):
        """Do my stuff"""

Test Code:

class MyDataFrame(pd.DataFrame):

    @property
    def _constructor(self):
        return MyDataFrame

df = MyDataFrame([1])
print(type(df))
df = df.rename(columns={})
print(type(df))

Test Results:

<class '__main__.MyDataFrame'>
<class '__main__.MyDataFrame'>

2 Comments

plus one for effort. But won't this be difficult because pandas will just return a dataframe in most cases. You have to do some additional trickery to override every pd.DataFrame method that returns pd.DataFrame. Otherwise, this is a one use method and you are back to a pdDataFrame... most likely.
@piRSquared, you are correct as usual. But there appears to be an easy workaround.
2

I have used the Ivan Mishalkins handy solution in our in-house python library extensively. At some point I thought, it would be better to use his solution in form of a decorator. The only restriction is that the first argument of decorated function must be a DataFrame:

from copy import deepcopy
from functools import wraps
import pandas as pd
from pandas.core.base import PandasObject

def as_method(func):
    """
    This decrator makes a function also available as a method.
    The first passed argument must be a DataFrame.
    """

    @wraps(func)
    def wrapper(*args, **kwargs):
        return func(*deepcopy(args), **deepcopy(kwargs))

    setattr(PandasObject, wrapper.__name__, wrapper)

    return wrapper


@as_method
def augment_x(DF, x):
    """We will be able to see this docstring if we run ??augment_x"""
    DF[f"column_{x}"] = x

    return DF

Example:

df = pd.DataFrame({"A": [1, 2]})
df
   A
0  1
1  2

df.augment_x(10)
   A  column_10
0  1         10
1  2         10

As you can see, the original DataFrame is not changed. As if there is a inplace = False

df
   A
0  1
1  2

You can still use the augment_x as a simple function:

augment_x(df, 2)
    A   column_2
0   1   2
1   2   2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.