0

This is a general python question. Is it possible to assign different variables to a class object and then perform different set of operations on those variables? I'm trying to reduce code but maybe this isn't how it works. For example, I'm trying to do something like this:

Edit: here is an abstract of the class and methods:

class Class:
    def __init__(self, df):
        self.df = df

    def query(self, query):
        self.df = self.df.query(query)
        return self

    def fill(self, filter):
        self.df.update(df.filter(like=filter).mask(lambda x: x == 0).ffill(1))
        return self

    def diff(self, cols=None, axis=1):
        diff = self.df[self.df.columns[~self.df.columns.isin(cols)]].diff(axis=axis)
        self.df = diff.join(self.df[self.df.columns.difference(diff.columns)])
        return self

    def melt(self, cols, var=None, value=None):
        return pd.melt(self.df, id_vars=columns, var_name=var, value_name=value)

I'm trying to use it like this:

df = pd.read_csv('data.csv')

df = Class(df)
df = df.query(query).forward_fill(include)

df_1 = df.diff(cols).melt(cols)

df_2 = df.melt(cols)

df_1 and df_2 should have different values, however they are the same as df_1. This issue is resolved if I use the class like this:

df_1 = pd.read_csv('data.csv')
df_2 = pd.read_csv('data.csv')

df_1 = Class(df_1)
df_2 = Class(df_2)

df_1 = df_1.query(query).forward_fill(include)
df_2 = df_2.query(query).forward_fill(include)

df_1 = df_1.diff(cols).melt(cols)

df_2 = df_2.melt(cols)

This results in extra code. Is there a better way to do this where you can use an object differently on different variables, or do I have to create seperate objects if I'm trying to have two variables perform separate operations and return different values?

5
  • Can you show us what oper_1 and oper_2 are doing? Commented Jun 29, 2020 at 18:22
  • They're pandas dataframe manipulations chained together Commented Jun 29, 2020 at 18:27
  • 1
    Show a definition of Class that lets us reproduce the error. You also need to include both the expected and observed output. Commented Jun 29, 2020 at 18:28
  • 2 + 5 and 3 + 4 produce the same output as well, but that doesn't mean either one is wrong. Commented Jun 29, 2020 at 18:28
  • @chepner added class details Commented Jun 29, 2020 at 19:22

1 Answer 1

1

With the return self statement in the diff- method you return the reference of the object. The same thing happens after the melt method. But in that two methods you allreadey manipulated the origin df.

Here:

1 df = pd.read_csv('data.csv')
2
3 df = Class(df)
4 df = df.query(query).forward_fill(include)
5 
6 df_1 = df.diff(cols).melt(cols)

the df has the same values like df_1. I guess the melt method without other args then cols arguments only assigns col names or something like that. Subsequently df_2=df.melt(cols) would have the same result like df_2=df_1.melt(cols).

If you want to work with one object, you dont should use self.df=... in your class methods, because this changes the instance value of df. You only need to write df = ... and than return Class(df).

For example:

def diff(self, cols=None, axis=1):
    diff = self.df[self.df.columns[~self.df.columns.isin(cols)]].diff(axis=axis)
    df = diff.join(self.df[self.df.columns.difference(diff.columns)])
    return Class(df)

Best regards

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.