1

Consider the following.

import pandas as pd
d=pd.DataFrame([[1,'a'],[1,'b'],[2,'c'],[2,'a'],[3,'c'],[4,'a'],[4,'c']],columns=['A','B'])

I want those values in A which have EXACTLY 'c' ('c' and only 'c') associated with them. There is only one such value. It is 3. I wrote the following query, but it is not returning the right result.

d[ d.B.isin(['c'])  & ~d.A.isin(d[d.B.isin(set(d.B.unique())-{'c'})].A.to_frame()) ].A.to_frame()

My idea is to find all values in A that have 'c' associated with them and then remove from them those values that also have something other than 'c' associated with them. But what the code returns is just the values that have 'c' associated with them. Can someone help me with this? Thank you.

6
  • d.loc[d['B']=='c','A'] will return all values in column A that have a column B value equal to 'c' Commented Oct 30, 2019 at 9:32
  • My question is different. I want to know which values in A have EXACTLY 'c' associated with them in column B. Your query returns the values in A that have 'c' associated with them. These values in A may also have values in B that are other than 'c' associated with them. Commented Oct 30, 2019 at 9:47
  • I don't know why my question was marked as duplicate. I am not asking how to look for rows with a given value in a certain column. I already know that. I want to know how to find values in column A that have a certain value in another column associated with them, and no other values in that column associated with them. Do you see the difference, jezrael? Commented Oct 30, 2019 at 9:48
  • 1
    I state it in the question "I want those values in A which have EXACTLY 'c' ('c' and only 'c') associated with them. There is only one such value. It is 3. I wrote the following query, but it is not returning the right result." Commented Oct 30, 2019 at 10:03
  • Yes. What is the logic behind this part? I see that if any values in A have other than 'c' associated with them, they are treated as duplicates and removed, but I can't correlate it with the syntax you have used. Also, how do I reach you? Your web page says that one should feel free to write to you but how are you to be reached? I won't bother you, but there might be instances when I might need to reach you. Commented Oct 30, 2019 at 10:10

1 Answer 1

1

The simplest idea is filter values with c and not allow duplicates in A column:

s1 = d.loc[d.B.eq('c') & ~d.A.duplicated(keep=False), 'A']
print (s1)
4    3
Name: A, dtype: int64

Your solution works by removing .to_frame(), but better is to use loc for select by mask - evaluation order matters:

s2 = d.loc[ d.B.isin(['c'])  & ~d.A.isin(d.loc[d.B.isin(set(d.B.unique())-{'c'}), 'A']), 'A']
print (s2)
4    3
Name: A, dtype: int64

s2 = d[ d.B.isin(['c'])  & ~d.A.isin(d[d.B.isin(set(d.B.unique())-{'c'})].A) ].A
print (s2)
4    3
Name: A, dtype: int64
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.