I have a dataframe in the following form
df = pd.DataFrame({'user':[1,1,1,1,2,2,2,2,3,3,3,3],
'item':[1,2,3,4,1,2,3,4,1,2,3,4],
'rating':[1,2,np.nan,4,5,np.nan,7,8,np.nan,9,11,np.nan]})
miniR = df.pivot_table(index='user',columns='item',values='rating')
miniR
item 1 2 3 4
user
1 1.0 2.0 NaN 4.0
2 5.0 NaN 7.0 8.0
3 NaN 9.0 11.0 NaN
I can get a list of the non-null indices
miniR.stack().index
MultiIndex([(1, 1),
(1, 2),
(1, 4),
(2, 1),
(2, 3),
(2, 4),
(3, 2),
(3, 3)],
names=['user', 'item'])
If I filter using a single tuple it returns a scalar value
miniR.loc[(1,2)]
2.0
I can also overwrite that single value to be some other value
miniR.loc[(1,2)] = np.nan
miniR
item 1 2 3 4
user
1 1.0 NaN NaN 4.0
2 NaN NaN 7.0 8.0
3 NaN NaN 11.0 NaN
However, if I try to index using two tuples it returns the overlapping rows/columns of the two tuples instead of scalar values.
miniR.loc[(1,2), (2,1)]
item 2 1
user
1 2.0 1.0
2 NaN NaN
And if I include a list of tuples greater than two I get an error
print(miniR.loc[(1,2), (2,1), (1,4)])
IndexingError: Too many indexers
I'd like to set a specific list of indices in my dataframe to be null. I can iterate over the list of index tuples I have to achieve this, but is there a vectorized way to index my dataframe using a list of index tuples and overwrite them in one line? Something like
miniR.loc[(1,2), (2,1), (1,4)] = np.nan
which would ideally return
item 1 2 3 4
user
1 1.0 NaN NaN NaN
2 NaN NaN 7.0 8.0
3 NaN 9.0 11.0 NaN
I thought something like the following would work, but it doesn't return anything
miniR.loc[miniR.index.isin([(1,2), (2,1), (1,4)])]
Empty DataFrame
Columns: [1, 2, 3, 4]
Index: []
I can't seem to find much documentation on this topic either.