3

I have research, but found no answer to the question below.

How can I do a boolean comparison for a list of substrings in a list of strings?

Below is the code:

string = {'strings_1': ['AEAB', 'AC', 'AI'], 
             'strings_2':['BB', 'BA', 'AG'], 
             'strings_3': ['AABD', 'DD', 'PP'], 
             'strings_4': ['AV', 'AB', 'BV']}

df_string = pd.DataFrame(data = string)

substring_list = ['AA', 'AE']

for row in df_string.itertuples(index = False):
    combine_row_str = [row[0], row[1], row[2]]

    #below is the main operation
    print(all(substring in row_str for substring in substring_list for row_str in combine_row_str))

The output I get is:

False
False
False

The output I want is:

True
False
False

2 Answers 2

3

Here's one way using pd.DataFrame.sum and a list comprehension:

df = pd.DataFrame(data=string)

lst = ['AA', 'AE']

df['test'] = [all(val in i for val in lst) for i in df.sum(axis=1)]

print(df)

  strings_1 strings_2 strings_3 strings_4   test
0      AEAB        BB      AABD        AV   True
1        AC        BA        DD        AB  False
2        AI        AG        PP        BV  False
Sign up to request clarification or add additional context in comments.

Comments

2

Since you are using pandas, you can invoke apply row-wise and str.contains with regex to find if strings do match. The first step is to find if any of the values match the strings in the substring_list:

df_string.apply(lambda x: x.str.contains('|'.join(substring_list)), axis=1)

this returns:

   strings_1  strings_2  strings_3  strings_4
0       True      False       True      False
1      False      False      False      False
2      False      False      False      False

Now, what is not clear though is whether you want to return true if both substrings are present within a row or only either of them. If only either of them, you can simply add any() after the contains() method:

df_string.apply(lambda x: x.str.contains('|'.join(substring_list)).any(), axis=1)

this returns:

0     True
1    False
2    False
dtype: bool

For the second case jpp provides a one line solution with concating row elements into one string, but please note it will not work for corner cases when you have two elems in a row, say, "BBA" and "ABB" and you try to match for "AA". Concated string "BBAABB" will still match "AA", which is wrong. I would like to propose a solution with apply and an extra function, so that code is more readable:

def areAllPresent(vals, patterns):
  result = []
  for pat in patterns:
    result.append(any([pat in val for val in vals]))
  return all(result)

df_string.apply(lambda x: areAllPresent(x.values, substring_list), axis=1)

Due to your sample dataframe it will still return the same result, but it works for cases when matching both is necessary:

0     True
1    False
2    False
dtype: bool

11 Comments

Hey thank you for the answer. What should I do if I want both 'AA' and 'AE' to be contained in each position? Meaning, Boolean check whether row 0 and column 0 contains both 'AA' and 'AE' substr. Boolean check whether row 0 and column 1 contains both 'AA' and 'AE' and so on..
I tried doing this but doesn't work: df_string.apply(lambda x: x.str.contains((?=substring_list), axis=1)
You can do that with a regular expression which matches multiple look ahead groups as follows: expr = '(?=.*' + ')(?=.*'.join(substring_list) + ')' df_string.apply(lambda x: x.str.contains(expr), axis=1) in your case the regular expression is: (?=.*AA)(?=.*AE) If you find my answer useful, please do marked it as an accepted one, thanks :)
Thank you. Why is it not (?=.*AA.*)(?=.*AE.*)
(?=.*AA.*)(?=.*AE.*) does the same thing, the trailing .* is redundant since ?= is a positive look-ahead operator. That means that it will match the expression group but will not capture it, the next group will be matched with the initial matching string. Essentially you are reproducing AND operator inside a regular expression. Have a look here: Regular Expressions: Is there an AND operator?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.