I have a multi-index pandas DataFrame such as below, primarily indexed with DateTime object.
>>> type(feed_tail)
<class 'pandas.core.frame.DataFrame'>
>>> feed_tail.index
DatetimeIndex(['2022-11-11', '2022-11-14', '2022-11-15', '2022-11-16',
'2022-11-17', '2022-11-18', '2022-11-21', '2022-11-22',
'2022-11-23', '2022-11-24'],
dtype='datetime64[ns]', name='Date', freq=None)
>>> feed_tail.columns
MultiIndex([( 'Close', 'BALKRISIND.NS'),
( 'Close', 'KSB.NS'),
( 'SMA13', 'BALKRISIND.NS'),
( 'SMA13', 'KSB.NS'),
('ClosegtSMA13', 'BALKRISIND.NS'),
('ClosegtSMA13', 'KSB.NS'),
( 'MTDPerf', 'BALKRISIND.NS'),
( 'MTDPerf', 'KSB.NS')],
names=['Attributes', 'Symbols'])
>>> feed_tail
Attributes Close SMA13 ClosegtSMA13 MTDPerf
Symbols BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS
Date
2022-11-11 1889.45 1834.40 1933.03 1959.00 False False -3.73 -11.86
2022-11-14 1875.55 1848.60 1927.28 1944.42 False False -4.44 -11.18
2022-11-15 1963.20 1954.15 1928.51 1938.12 True True 0.02 -6.11
2022-11-16 1956.30 1969.75 1929.43 1933.65 True True -0.33 -5.36
2022-11-17 1978.35 1959.55 1932.08 1927.51 True True 0.79 -5.85
2022-11-18 1972.75 1917.90 1932.85 1914.94 True True 0.51 -7.85
2022-11-21 1945.80 1874.70 1932.80 1902.38 True False -0.86 -9.93
2022-11-22 1950.30 1882.85 1932.60 1892.80 True False -0.63 -9.54
2022-11-23 1946.60 1930.90 1936.52 1893.97 True True -0.82 -7.23
2022-11-24 1975.40 1925.80 1941.11 1901.10 True True 0.64 -7.47
I am trying to access/filter the dataframe into another dataframe, for every datetime index in sequence where ClosegtSMA13 column is True but seems like I am failing at understanding the datamodel here. Quest is to iterate over the datetime index in sequence, and get dataframes where the Symbols' ClosegtSMA13 is True or Close is greater than SMA13 with in the same dataframe and then go over the filtered/queried dataframe for further processing within the loop.
Any help towards unravelling this further is sincerely appreciated.
Thank you
Updates:
Following @jezrael's suggestion to use mask. This helps in performing the 'OR' operation in a way that prefers to get all series rows that have satisfying Close gt SMA13 for all symbols though.
>>> feed_tail
Attributes Close SMA13 ClosegtSMA13 MTDPerf
Symbols BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS
Date
2022-11-11 1889.45 1834.40 1933.03 1959.00 False False -3.73 -11.86
2022-11-14 1875.55 1848.60 1927.28 1944.42 False False -4.44 -11.18
2022-11-15 1963.20 1954.15 1928.51 1938.12 True True 0.02 -6.11
2022-11-16 1956.30 1969.75 1929.43 1933.65 True True -0.33 -5.36
2022-11-17 1978.35 1959.55 1932.08 1927.51 True True 0.79 -5.85
2022-11-18 1972.75 1917.90 1932.85 1914.94 True True 0.51 -7.85
2022-11-21 1945.80 1874.70 1932.80 1902.38 True False -0.86 -9.93
2022-11-22 1950.30 1882.85 1932.60 1892.80 True False -0.63 -9.54
2022-11-23 1946.60 1930.90 1936.52 1893.97 True True -0.82 -7.23
2022-11-24 1975.40 1925.80 1941.11 1901.10 True True 0.64 -7.47
>>> mask = feed_tail['Close'].gt(feed_tail['SMA13']).any(axis=1)
>>> df = feed_tail[mask]
>>> df
Attributes Close SMA13 SMA13gtClose MTDPerf
Symbols BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS BALKRISIND.NS KSB.NS
Date
2022-11-15 1963.20 1954.15 1928.51 1938.12 True True 0.02 -6.11
2022-11-16 1956.30 1969.75 1929.43 1933.65 True True -0.33 -5.36
2022-11-17 1978.35 1959.55 1932.08 1927.51 True True 0.79 -5.85
2022-11-18 1972.75 1917.90 1932.85 1914.94 True True 0.51 -7.85
2022-11-21 1945.80 1874.70 1932.80 1902.38 True False -0.86 -9.93
2022-11-22 1950.30 1882.85 1932.60 1892.80 True False -0.63 -9.54
2022-11-23 1946.60 1930.90 1936.52 1893.97 True True -0.82 -7.23
2022-11-24 1975.40 1925.80 1941.11 1901.10 True True 0.64 -7.47
Overall quest is associated with bigger shape of this dataframe model where I intend to get the top 'MTDPerf' items for each day and this seemingly helps but I would like to filter by making sure they have their 'Close gt SMA13' before checking for their MTDPerf values.
>>> for dt in feed_tail.index:
...
feed_tail['MTDPerf'].loc[dt].head(10).sort_values(ascending=False)
Trying to filter before going for MTDPerf related stuff,
>>> for dt in feed_tail.index:
... d=feed_tail[feed_tail['Close'].loc[dt] > feed_tail['SMA13'].loc[dt]]
... d
...
<stdin>:2: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "lib/python3.9/site-packages/pandas/core/frame.py", line 3796, in __getitem__
return self._getitem_bool_array(key)
File "lib/python3.9/site-packages/pandas/core/frame.py", line 3849, in _getitem_bool_array
key = check_bool_indexer(self.index, key)
File "lib/python3.9/site-packages/pandas/core/indexing.py", line 2548, in check_bool_indexer
raise IndexingError(
pandas.errors.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
Solution (or approach used so far): (many thanks to @jezrael 's leading answer)
mmtd = feed_tail.where(feed_tail['Close'] > feed_tail['SMA13']).where(feed_tail['MTDPerf'] > 0)
for dt in mmtd.index:
dt_str = dt.strftime("%Y-%m-%d")
a = mmtd.loc[dt_str, ['Close', 'MTDPerf']]
a = a[a.notna()]
unstacked_a = a.unstack(0)
if not unstacked_a.empty:
unstacked_a = unstacked_a.sort_values(by=(['MTDPerf']), ascending=False)
print(dt_str, unstacked_a)