8

I have read this documentation:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

You can use a syntax like df.loc[df['shield'] > 6, ['max_speed']].

I tried using Github and found out:

Suppose you have a pandas.core.frame.DataFrame object, i.e. a DataFrame called df.

The type of df.loc is pandas.core.indexing._LocIndexer.

Nevertheless, I could not sort out these questions:

  1. How do you make a Python function/class accepting a syntax like above?

  2. Where in the source code of pandas.core.frame.DataFrame is the property self.loc defined??

4
  • 1
    Look at this file on github. github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py and this github.com/pandas-dev/pandas/blob/… Commented Jul 29, 2019 at 12:33
  • @Poojan Thanks, but where does get_loc become self.loc?? Sorry for being so stupid, but I don't get it :/ Commented Jul 29, 2019 at 12:45
  • @Poojan I looked at the get_loc function but I cannot find any @property decorator or any other syntax defining self.loc. I git cloned the pandas repo. Used git grep -n self.loc and git grep -A1 @property | grep loc but I can't find it. Could you solve my issue? Commented Jul 29, 2019 at 14:17
  • 1
    I am unable to find exact location of code where .loc is implemented but it is implemented somewhere in this file. github.com/pandas-dev/pandas/blob/master/pandas/core/…. Implementation of loc is more complex. You can search in indexing.py file for some in depth explanation . Write answer if you find a good explanation for this. Commented Jul 29, 2019 at 15:29

2 Answers 2

6
  1. How you make a class accept that syntax in general is by implementing __getitem__ which is an example of operator overloading. This allows an object of that class to be indexed with []. For example:

    class get_item_example(object):
     def __getitem__(self, key):
             print(key)
    

    Try it out:

    >>> gi = get_item_example()
    >>> gi['a']
    a
    >>> gi[['a','b','c']]
    ['a', 'b', 'c']
    >>> gi['a','b','c']
    ('a', 'b', 'c')
    

    In the case of df.loc[df['shield'] > 6, ['max_speed']] what happens is that the key passed to __getitem__ is a tuple containing the pandas series returned by df['shield'] > 6 and the single item list ['max_speed'].

  2. In the pandas source, pandas.core.indexing._LocIndexer inherits an implementation of __getitem__ from pandas.core.indexing. _LocationIndexer. The implementation is here: https://github.com/pandas-dev/pandas/blob/61362be9ea4d69b33ae421f1f98b8db50be611a2/pandas/core/indexing.py#L1374

Sign up to request clarification or add additional context in comments.

Comments

0

If you want to have a property that has indexer (df.loc[] vs. df[]), you can create another class that will process the indexer and that gets created and passed back when the property is accessed.

class Helper(self)
   def __init__(self, parent):
       self.parent = parent

   def __getitem__(self, idx):
       ### do your indexing here

class Parent(self):
   ...

   @property
   def fancy_indexer(self):
       return Helper(self)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.