0

This is the data frame that I want to search on and get back the matching row number. 'A' and 'AB' are completely different things.

df2 = pd.DataFrame(np.array(['A','B','AC','AD','NAN','XX','BC','SLK','AC','AD','NAN','XU','BB','FG','XZ','XY','AD','NAN','NF','XY','AB','AC','AD','NAN','XY','LK','AC','AC','AD','NAN','KH','BC','GF','BC','AD']).reshape(5,7),columns=['a','b','c','d','e','f','g'])


    a   b   c   d   e   f   g
0   A   B   AC  AD  NAN XX  BC
1   SLK AC  AD  NAN XU  BB  FG
2   XZ  XY  AD  NAN NF  XY  AB
3   AC  AD  NAN XY  LK  AC  AC
4   AD  NAN KH  BC  GF  BC  AD

The strings I will be searching for are from this smaller data frame. Where each row has to be searched as AND, to get back matched string row index of data frame df2.

df = pd.DataFrame(np.array(['A','B','C','D','AA','AB','AC','AD','NAN','BB','BC','AD']).reshape(6,2),columns=['a1','b1'])


a1  b1
0   A   B  # present in the first row of df2
1   C   D  # not present in any row of df2
2   AA  AB # not present in any row of df2
3   AC  AD # present in the second row of df2
4   NAN BB # present in the second row of df2
5   BC  AD # present in the fourth row of df2

AND part

Desired output [0,1,3,4]

import pandas as pd
import numpy as np


index1 = df.index # Finds the number of row in df
terms=[]
React=[]
for i in range(len(index1)): #for loop to search each row of df dataframe
  terms=df.iloc[i] # Get i row
  terms[i]=terms.values.tolist() # converts to a list
  print(terms[i]) # to check
    # each row
  for term in terms[i]: # to search for each string in the 
    print(term)
    results = pd.DataFrame()
    if results.empty:
      results = df2.isin( [ term ] )
    else:
      results |= df2.isin( [ term ] ) 
  results['count'] = results.sum(axis=1)
  print(results['count'])
  print(results[results['count']==len(terms[i])].index.tolist())
  React=results[results['count']==len(terms[i])].index.tolist()
  React

Getting TypeError: unhashable type: 'list' on results = df2.isin( [ term ] )

For OR it should be easy buy have to exclude AND parts which are already Accounted in the first section

React2=df2.isin([X]).any(1).index.tolist()
React2
2
  • 1
    Raise the desired output of your supposed 'df2'. Commented Aug 10, 2020 at 1:32
  • @r-beginners Thank you so much for your comment. I did add the desired output after your comment. Commented Aug 10, 2020 at 2:35

1 Answer 1

1

It's not the output you'd expect, but I asked for the index in the AND condition. The resulting list of output contains the df2 indexes on a df row-by-row basis. Does this meet the intent of your question?

output = []
for i in range(len(df)):
    tmp = []
    for k in range(len(df2)):
        d = df2.loc[k].isin(df.loc[i,['a1']])
        f = df2.loc[k].isin(df.loc[i,['b1']])
        d = d.tolist()
        f = f.tolist()
        if sum(d) >= 1 and sum(f) >=1:
            tmp.append(k)
    output.append(tmp)

output
[[0], [], [], [0, 1, 3], [1], [0, 4]]
Sign up to request clarification or add additional context in comments.

1 Comment

Perfect this is working but i need some time to test with my original data. Please allow me sometimes 12 hr would be enough for me to test this. Thank you so much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.