Indexing Pandas DataFrame Using List of Tuples

Question

I have a dataframe in the following form

df = pd.DataFrame({'user':[1,1,1,1,2,2,2,2,3,3,3,3],
                   'item':[1,2,3,4,1,2,3,4,1,2,3,4],
                   'rating':[1,2,np.nan,4,5,np.nan,7,8,np.nan,9,11,np.nan]})

miniR = df.pivot_table(index='user',columns='item',values='rating')

miniR

item    1    2     3    4
user                     
1     1.0  2.0   NaN  4.0
2     5.0  NaN   7.0  8.0
3     NaN  9.0  11.0  NaN

I can get a list of the non-null indices

miniR.stack().index
MultiIndex([(1, 1),
            (1, 2),
            (1, 4),
            (2, 1),
            (2, 3),
            (2, 4),
            (3, 2),
            (3, 3)],
           names=['user', 'item'])

If I filter using a single tuple it returns a scalar value

miniR.loc[(1,2)]
2.0

I can also overwrite that single value to be some other value

miniR.loc[(1,2)] = np.nan
miniR

item    1   2     3    4
user                    
1     1.0 NaN   NaN  4.0
2     NaN NaN   7.0  8.0
3     NaN NaN  11.0  NaN

However, if I try to index using two tuples it returns the overlapping rows/columns of the two tuples instead of scalar values.

miniR.loc[(1,2), (2,1)]
item    2    1
user          
1     2.0  1.0
2     NaN  NaN

And if I include a list of tuples greater than two I get an error

print(miniR.loc[(1,2), (2,1), (1,4)])
IndexingError: Too many indexers

I'd like to set a specific list of indices in my dataframe to be null. I can iterate over the list of index tuples I have to achieve this, but is there a vectorized way to index my dataframe using a list of index tuples and overwrite them in one line? Something like

miniR.loc[(1,2), (2,1), (1,4)] = np.nan

which would ideally return

item    1    2     3    4
user                     
1     1.0  NaN   NaN  NaN
2     NaN  NaN   7.0  8.0
3     NaN  9.0  11.0  NaN

I thought something like the following would work, but it doesn't return anything

miniR.loc[miniR.index.isin([(1,2), (2,1), (1,4)])]

Empty DataFrame
Columns: [1, 2, 3, 4]
Index: []

I can't seem to find much documentation on this topic either.

NYC Coder · Accepted Answer · 2020-06-04 21:58:01Z

2

The simplest way to do this would be:

for tup in [(1,2), (2,1), (1,4)]:
    miniR.loc[tup] = np.nan

item    1    2     3    4
user
1     1.0  NaN   NaN  NaN
2     NaN  NaN   7.0  8.0
3     NaN  9.0  11.0  NaN

answered Jun 4, 2020 at 21:58

NYC Coder

7,6443 gold badges14 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

JRL Over a year ago

I was trying to avoid using a for loop for this. It just seems weird that you can't filter this type of dataframe in the way you normally would using something like .isin.

NYC Coder Over a year ago

I honestly haven't found a way to do it otherwise. Maybe someone else knows.

Esteban Over a year ago

Bump because I was also surprised I haven't found a way to do this without loops

Collectives™ on Stack Overflow

Indexing Pandas DataFrame Using List of Tuples

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related