Python: Create custom pd.dataframe class

Question

I would like to put some standard tasks for a panda dataframe like initialize with data and process this data into a class. I am currently performing the following sample steps:

import pandas as pd
import urllib.request


def __get_data():
    URL = r'https://en.wikipedia.org/wiki/List_of_sovereign_states_' \
          r'and_dependent_territories_by_continent_(data_file)#Data_file'
    HTML_STRING = urllib.request.urlopen(URL)
    return pd.read_html(HTML_STRING)[2]


def __prepare_data(df):
    df.iloc[:,-1] = df.iloc[:,-1].str.upper()
    return df


MyDataFrame = pd.DataFrame()
MyDataFrame = __get_data()
MyDataFrame = __prepare_data(MyDataFrame)

I'd like something like that:

class MyDataFrame(pd.DataFrame):
    def __init__(self, *args, **kwargs):
        super(MyDataFrame, self).__init__(*args, **kwargs)
        self = self.__get_data()
        self.__prepare_data()

    def __get_data(self):
        URL = r'https://en.wikipedia.org/wiki/List_of_sovereign_states_' \
              r'and_dependent_territories_by_continent_(data_file)#Data_file'
        HTML_STRING = urllib.request.urlopen(URL)
        return pd.read_html(HTML_STRING)[2]

    def __prepare_data(self):
        self.iloc[:, -1] = self.iloc[:, -1].str.upper()

Unfortunately I do not understand the Pandas documentation in this context.

Roelant · Accepted Answer · 2020-03-12 10:48:57Z

1

While I think this is ill-advised, this modification works:

class MyDataFrame(pd.DataFrame):
    def __init__(self, *args, **kwargs):
        super(MyDataFrame, self).__init__(*args, **kwargs)
        self.data = self.__get_data()
        self.__prepare_data()

    def __get_data(self):
        URL = r'https://en.wikipedia.org/wiki/List_of_sovereign_states_' \
              r'and_dependent_territories_by_continent_(data_file)#Data_file'
        HTML_STRING = urllib.request.urlopen(URL)
        return pd.read_html(HTML_STRING)[2]

    def __prepare_data(self):
        self.data.iloc[:, -1] = self.data.iloc[:, -1].str.upper()

d = MyDataFrame()

print(d.data)

Output:

    CC  a-2 a-3     #       Name
0   AS  AF  AFG     4.0     AFGHANISTAN, ISLAMIC REPUBLIC OF
1   EU  AL  ALB     8.0     ALBANIA, REPUBLIC OF
2   AN  AQ  ATA     10.0    ANTARCTICA (THE TERRITORY SOUTH OF 60 DEG S)
3   AF  DZ  DZA     12.0    ALGERIA, PEOPLES DEMOCRATIC REPUBLIC OF
4   OC  AS  ASM     16.0    AMERICAN SAMOA
...     ...     ...     ...     ...     ...
257     AF  ZM  ZMB 894.0   ZAMBIA, REPUBLIC OF
258     AS  XD  NaN NaN     UNITED NATIONS NEUTRAL ZONE
259     AS  XE  NaN NaN     IRAQ-SAUDI ARABIA NEUTRAL ZONE
260     AS  XS  NaN NaN     SPRATLY ISLANDS
261     OC  XX  NaN NaN     DISPUTED TERRITORY

edited Mar 12, 2020 at 10:48

Roelant

5,2295 gold badges43 silver badges77 bronze badges

answered Mar 12, 2020 at 10:46

Josh Friedlander

11.8k7 gold badges42 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Roelant Over a year ago

Why do you think it is ill-advised? :)

Josh Friedlander Over a year ago

There's no reason at all to subclass the dataframe here (there rarely is). But OP is a newbie, I don't want to be too critical.

28H4 Over a year ago

@JoshFriedlander I took to heart your advice that a subclass is rarely necessary for dataframes and implemented an alternative solution using @pd.api.extensions.register_dataframe_accessor() Pandas doc . Would this be the better way to establish user specific methods for a dataframe like get_data() and prepare_data() from my code example? What would be a good way to implement something like this?

Josh Friedlander Over a year ago

The way I see it, fetching the data is not a function of the dataframe. Nor is preparing the data. If you need to use an object, just create a DataProvider class that has a data attribute. It fetches and cleans the data and stores the output df as self.data

Collectives™ on Stack Overflow

Python: Create custom pd.dataframe class

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related