0

I have a multi-index pandas DataFrame such as below, primarily indexed with DateTime object.

>>> type(feed_tail)
<class 'pandas.core.frame.DataFrame'>

>>> feed_tail.index
DatetimeIndex(['2022-11-11', '2022-11-14', '2022-11-15', '2022-11-16',
           '2022-11-17', '2022-11-18', '2022-11-21', '2022-11-22',
           '2022-11-23', '2022-11-24'],
          dtype='datetime64[ns]', name='Date', freq=None)


>>> feed_tail.columns
MultiIndex([(       'Close', 'BALKRISIND.NS'),
        (       'Close',        'KSB.NS'),
        (       'SMA13', 'BALKRISIND.NS'),
        (       'SMA13',        'KSB.NS'),
        ('ClosegtSMA13', 'BALKRISIND.NS'),
        ('ClosegtSMA13',        'KSB.NS'),
        (     'MTDPerf', 'BALKRISIND.NS'),
        (     'MTDPerf',        'KSB.NS')],
       names=['Attributes', 'Symbols'])

>>> feed_tail
Attributes         Close                  SMA13           ClosegtSMA13              MTDPerf
Symbols    BALKRISIND.NS   KSB.NS BALKRISIND.NS   KSB.NS BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS
Date
2022-11-11       1889.45  1834.40       1933.03  1959.00         False  False         -3.73 -11.86
2022-11-14       1875.55  1848.60       1927.28  1944.42         False  False         -4.44 -11.18
2022-11-15       1963.20  1954.15       1928.51  1938.12          True   True          0.02  -6.11
2022-11-16       1956.30  1969.75       1929.43  1933.65          True   True         -0.33  -5.36
2022-11-17       1978.35  1959.55       1932.08  1927.51          True   True          0.79  -5.85
2022-11-18       1972.75  1917.90       1932.85  1914.94          True   True          0.51  -7.85
2022-11-21       1945.80  1874.70       1932.80  1902.38          True  False         -0.86  -9.93
2022-11-22       1950.30  1882.85       1932.60  1892.80          True  False         -0.63  -9.54
2022-11-23       1946.60  1930.90       1936.52  1893.97          True   True         -0.82  -7.23
2022-11-24       1975.40  1925.80       1941.11  1901.10          True   True          0.64  -7.47

I am trying to access/filter the dataframe into another dataframe, for every datetime index in sequence where ClosegtSMA13 column is True but seems like I am failing at understanding the datamodel here. Quest is to iterate over the datetime index in sequence, and get dataframes where the Symbols' ClosegtSMA13 is True or Close is greater than SMA13 with in the same dataframe and then go over the filtered/queried dataframe for further processing within the loop.

Any help towards unravelling this further is sincerely appreciated.

Thank you

Updates:

Following @jezrael's suggestion to use mask. This helps in performing the 'OR' operation in a way that prefers to get all series rows that have satisfying Close gt SMA13 for all symbols though.

>>> feed_tail
Attributes         Close                  SMA13           ClosegtSMA13              MTDPerf
Symbols    BALKRISIND.NS   KSB.NS BALKRISIND.NS   KSB.NS BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS
Date
2022-11-11       1889.45  1834.40       1933.03  1959.00         False  False         -3.73 -11.86
2022-11-14       1875.55  1848.60       1927.28  1944.42         False  False         -4.44 -11.18
2022-11-15       1963.20  1954.15       1928.51  1938.12          True   True          0.02  -6.11
2022-11-16       1956.30  1969.75       1929.43  1933.65          True   True         -0.33  -5.36
2022-11-17       1978.35  1959.55       1932.08  1927.51          True   True          0.79  -5.85
2022-11-18       1972.75  1917.90       1932.85  1914.94          True   True          0.51  -7.85
2022-11-21       1945.80  1874.70       1932.80  1902.38          True  False         -0.86  -9.93
2022-11-22       1950.30  1882.85       1932.60  1892.80          True  False         -0.63  -9.54
2022-11-23       1946.60  1930.90       1936.52  1893.97          True   True         -0.82  -7.23
2022-11-24       1975.40  1925.80       1941.11  1901.10          True   True          0.64  -7.47
>>> mask = feed_tail['Close'].gt(feed_tail['SMA13']).any(axis=1)
>>> df = feed_tail[mask]
>>> df
Attributes         Close                  SMA13           SMA13gtClose              MTDPerf
Symbols    BALKRISIND.NS   KSB.NS BALKRISIND.NS   KSB.NS BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS
Date
2022-11-15       1963.20  1954.15       1928.51  1938.12          True   True          0.02  -6.11
2022-11-16       1956.30  1969.75       1929.43  1933.65          True   True         -0.33  -5.36
2022-11-17       1978.35  1959.55       1932.08  1927.51          True   True          0.79  -5.85
2022-11-18       1972.75  1917.90       1932.85  1914.94          True   True          0.51  -7.85
2022-11-21       1945.80  1874.70       1932.80  1902.38          True  False         -0.86  -9.93
2022-11-22       1950.30  1882.85       1932.60  1892.80          True  False         -0.63  -9.54
2022-11-23       1946.60  1930.90       1936.52  1893.97          True   True         -0.82  -7.23
2022-11-24       1975.40  1925.80       1941.11  1901.10          True   True          0.64  -7.47

Overall quest is associated with bigger shape of this dataframe model where I intend to get the top 'MTDPerf' items for each day and this seemingly helps but I would like to filter by making sure they have their 'Close gt SMA13' before checking for their MTDPerf values.

>>> for dt in feed_tail.index:
...     
feed_tail['MTDPerf'].loc[dt].head(10).sort_values(ascending=False)

Trying to filter before going for MTDPerf related stuff,

>>> for dt in feed_tail.index:
...     d=feed_tail[feed_tail['Close'].loc[dt] > feed_tail['SMA13'].loc[dt]]
...     d
...
<stdin>:2: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "lib/python3.9/site-packages/pandas/core/frame.py", line 3796, in __getitem__
    return self._getitem_bool_array(key)
  File "lib/python3.9/site-packages/pandas/core/frame.py", line 3849, in _getitem_bool_array
    key = check_bool_indexer(self.index, key)
  File "lib/python3.9/site-packages/pandas/core/indexing.py", line 2548, in check_bool_indexer
    raise IndexingError(
pandas.errors.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

Solution (or approach used so far): (many thanks to @jezrael 's leading answer)

mmtd = feed_tail.where(feed_tail['Close'] > feed_tail['SMA13']).where(feed_tail['MTDPerf'] > 0)

for dt in mmtd.index:
    dt_str = dt.strftime("%Y-%m-%d")
    a = mmtd.loc[dt_str, ['Close', 'MTDPerf']]
    a = a[a.notna()]
    unstacked_a = a.unstack(0)
    if not unstacked_a.empty:
        unstacked_a = unstacked_a.sort_values(by=(['MTDPerf']), ascending=False)
        print(dt_str, unstacked_a)

     

1 Answer 1

1

Use DataFrame.loc with filter MTDPerf Series:

for dt in feed_tail.index:
    mmtd = feed_tail.loc[dt, 'MTDPerf']
    d = mmtd[feed_tail.loc[dt, 'Close'] > feed_tail.loc[dt, 'SMA13']]

    print (d)

Series([], Name: 2022-11-11 00:00:00, dtype: object)
Series([], Name: 2022-11-14 00:00:00, dtype: object)
Symbols
BALKRISIND.NS    0.02
KSB.NS          -6.11
Name: 2022-11-15 00:00:00, dtype: object
Symbols
BALKRISIND.NS   -0.33
KSB.NS          -5.36
Name: 2022-11-16 00:00:00, dtype: object
Symbols
BALKRISIND.NS    0.79
KSB.NS          -5.85
Name: 2022-11-17 00:00:00, dtype: object
Symbols
BALKRISIND.NS    0.51
KSB.NS          -7.85
Name: 2022-11-18 00:00:00, dtype: object
Symbols
BALKRISIND.NS   -0.86
Name: 2022-11-21 00:00:00, dtype: object
Symbols
BALKRISIND.NS   -0.63
Name: 2022-11-22 00:00:00, dtype: object
Symbols
BALKRISIND.NS   -0.82
KSB.NS          -7.23
Name: 2022-11-23 00:00:00, dtype: object
Symbols
BALKRISIND.NS    0.64
KSB.NS          -7.47
Name: 2022-11-24 00:00:00, dtype: object()

Solution with DataFrame.where for replace NaNs if no match:

df = feed_tail['MTDPerf'].where(feed_tail['Close'] > feed_tail['SMA13'])
print (df)
Symbols     BALKRISIND.NS  KSB.NS
2022-11-11            NaN     NaN
2022-11-14            NaN     NaN
2022-11-15           0.02   -6.11
2022-11-16          -0.33   -5.36
2022-11-17           0.79   -5.85
2022-11-18           0.51   -7.85
2022-11-21          -0.86     NaN
2022-11-22          -0.63     NaN
2022-11-23          -0.82   -7.23
2022-11-24           0.64   -7.47

And after reshaping:

s = feed_tail['MTDPerf'].where(feed_tail['Close'] > feed_tail['SMA13']).stack()
print (s)
            Symbols      
2022-11-15  BALKRISIND.NS    0.02
            KSB.NS          -6.11
2022-11-16  BALKRISIND.NS   -0.33
            KSB.NS          -5.36
2022-11-17  BALKRISIND.NS    0.79
            KSB.NS          -5.85
2022-11-18  BALKRISIND.NS    0.51
            KSB.NS          -7.85
2022-11-21  BALKRISIND.NS   -0.86
2022-11-22  BALKRISIND.NS   -0.63
2022-11-23  BALKRISIND.NS   -0.82
            KSB.NS          -7.23
2022-11-24  BALKRISIND.NS    0.64
            KSB.NS          -7.47
dtype: float64
Sign up to request clarification or add additional context in comments.

12 Comments

@NikhilMulley - what days should be in ouput from sample data? Id necessary filter only by SMA13gtClose mask?
@NikhilMulley -I dont understand but I would like to filter by making sure they have their 'Close gt SMA13' before checking - what does it mean?
@NikhilMulley - It means is necessary use mask from SMA13gtClose for MTDPerf and if False is necessary remove from top10? Or something else?
@NikhilMulley - thank you for code, answer was edited.
@NikhilMulley - Or chain mask by | for bitwise OR - mmtd = feed_tail.where((feed_tail['Close'] > feed_tail['SMA13']) | (feed_tail['MTDPerf'] > 0))
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.