2

I have a dataframe in the following form

df = pd.DataFrame({'user':[1,1,1,1,2,2,2,2,3,3,3,3],
                   'item':[1,2,3,4,1,2,3,4,1,2,3,4],
                   'rating':[1,2,np.nan,4,5,np.nan,7,8,np.nan,9,11,np.nan]})

miniR = df.pivot_table(index='user',columns='item',values='rating')
miniR

item    1    2     3    4
user                     
1     1.0  2.0   NaN  4.0
2     5.0  NaN   7.0  8.0
3     NaN  9.0  11.0  NaN

I can get a list of the non-null indices

miniR.stack().index
MultiIndex([(1, 1),
            (1, 2),
            (1, 4),
            (2, 1),
            (2, 3),
            (2, 4),
            (3, 2),
            (3, 3)],
           names=['user', 'item'])

If I filter using a single tuple it returns a scalar value

miniR.loc[(1,2)]
2.0

I can also overwrite that single value to be some other value

miniR.loc[(1,2)] = np.nan
miniR

item    1   2     3    4
user                    
1     1.0 NaN   NaN  4.0
2     NaN NaN   7.0  8.0
3     NaN NaN  11.0  NaN

However, if I try to index using two tuples it returns the overlapping rows/columns of the two tuples instead of scalar values.

miniR.loc[(1,2), (2,1)]
item    2    1
user          
1     2.0  1.0
2     NaN  NaN

And if I include a list of tuples greater than two I get an error

print(miniR.loc[(1,2), (2,1), (1,4)])
IndexingError: Too many indexers

I'd like to set a specific list of indices in my dataframe to be null. I can iterate over the list of index tuples I have to achieve this, but is there a vectorized way to index my dataframe using a list of index tuples and overwrite them in one line? Something like

miniR.loc[(1,2), (2,1), (1,4)] = np.nan

which would ideally return

item    1    2     3    4
user                     
1     1.0  NaN   NaN  NaN
2     NaN  NaN   7.0  8.0
3     NaN  9.0  11.0  NaN

I thought something like the following would work, but it doesn't return anything

miniR.loc[miniR.index.isin([(1,2), (2,1), (1,4)])]

Empty DataFrame
Columns: [1, 2, 3, 4]
Index: []

I can't seem to find much documentation on this topic either.

1 Answer 1

2

The simplest way to do this would be:

for tup in [(1,2), (2,1), (1,4)]:
    miniR.loc[tup] = np.nan

item    1    2     3    4
user
1     1.0  NaN   NaN  NaN
2     NaN  NaN   7.0  8.0
3     NaN  9.0  11.0  NaN
Sign up to request clarification or add additional context in comments.

3 Comments

I was trying to avoid using a for loop for this. It just seems weird that you can't filter this type of dataframe in the way you normally would using something like .isin.
I honestly haven't found a way to do it otherwise. Maybe someone else knows.
Bump because I was also surprised I haven't found a way to do this without loops

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.