python & pandas - Drop rows where column values are index values in another DataFrame

Question

The Original DataFrame(df1) looks like:

  NoUsager Sens NoAdresse Fait  Weekday NoDemande  Periods
 0 000001   +    000079    1     Dim   42191000972 Soir 
 1 001875   +    005018    1     Dim   42191001052 Matin 
 2 001651   +    005018    1     Dim   42191001051 Matin 
 3 001486   +    000405    1     Dim   42191001250 Matin 
 4 002021   +    005712    1     Dim   42191000013 Matin 
 5 001975   +    005712    1     Dim   42191000012 Matin 
 6 001304   +    001408    1     Dim   42191000371 Matin 
 7 001355   +    005021    1     Dim   42191000622 Matin 
 8 002274   +    006570    1     Dim   42191001053 Matin 
 9 000040   +    004681    1     Dim   42191002507 Soir

I used crosstab to generate a new one(df2) with index = NoDemande, NoUsager, Periods and columns = ['Sens']:

                       Sens  + - 
NoDemande  NoUsager Periods
42191000622 001355  Matin    1 2 
42191000959 001877  Matin    1 2 
42191001325 000627  Soir     1 2 
42191001412 000363  Matin    1 2 
42191001424 000443  Soir     1 2 
42191001426 001308  Soir     1 2 
42191002507 000040  Soir     2 0 
42193000171 000257  Soir     1 2 
42193000172 002398  Soir     1 2

I want to drop all the rows from df1 where values in columns NoUsager and NoDemande are the same as the one in index NoUsager and NoDemande in df2. So the result will return a new DataFramedf3 with the same df1 format but without line7 and line9.

I tried:

df3 = df1.loc[~df1['NoDemande','NoUsager'].isin([df2.NoDemande,df2.NoUsager])]

But it returned: KeyError: ('NoDemande', 'NoUsager')

How can I solve this problem?

Any help will be appreciated!

piRSquared · Accepted Answer · 2016-09-08 13:36:58Z

2

cols = ['NoDemande','NoUsager']
mask = df1[cols].isin(df2.reset_index()[cols].to_dict('list'))
df1[~mask.all(1)]

There were three things you were doing incorrectly.

df1['NoDemande','NoUsager'] needs to be df1[['NoDemande','NoUsager']]
df2 has index levels with names ['NoDemande','NoUsager']. You must reset the index to turn them back into columns.
When using isin for this purpose, transform df2.reset_index()[['NoDemande','NoUsager']] into a dictionary.

answered Sep 8, 2016 at 13:36

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Ami Tavory Over a year ago

Nice answer. However, is there any reason not to do df[['NoUsager', 'NoDemande']].isin(df2.reset_index()[['NoUsager', 'NoDemande']]).all(axis=1)? What does the to_dict give here?

piRSquared Over a year ago

@AmiTavory yes, against my intuition, what you propose, doesn't work. I'll try to write up something explaining why.

ch36r5s Over a year ago

@piRSquared Thanks for this nice answer. So, it's impossible to compare columns and index?

piRSquared Over a year ago

@ch36r5s I wouldn't say that, just not the way you tried it. There are many ways to compare and index to columns.

Collectives™ on Stack Overflow

python & pandas - Drop rows where column values are index values in another DataFrame

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related