1

I am trying to identify if a column Blaze[Info] contains within the text a string from a list (and create a new Boolean column with that information).

The DataFrame looks like:

       Word          Info
0      Aam           Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham...
1      aard-vark     Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.)
2      aard-wolf     Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.)

When I state the term directly I get the answer I want:

Blaze['Noun'] = np.where((Blaze['Info'].str.contains('n.')),True,False) Blaze['Verb'] = np.where((Blaze['Info'].str.contains('v.')),True,False)

       Word          Info                                                Noun   Verb
0      Aam           Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham...   True   False
1      aard-vark     Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.)      True   False
2      aard-wolf     Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.)       True   False

but this is not scalable as I have 100+ features to search for.

When I iterate through the list abbreviation :

abbreviation=['n'., 'v.']
col_name=['Noun','Verb']

for i in range(len(abbreviation)):
    Blaze[col_name[i]] = np.where((Blaze['Info'].str.contains(abbreviation[i])), True, False)

I am returned DataFrame full of 'FALSE' entries:

       Word          Info                                                Noun   Verb
0      Aam           Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham...   False  False
1      aard-vark     Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.)      False  False
2      aard-wolf     Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.)       False  False

I can see several answers for doing something similar but grouping the answer in a single row: Check if each row in a pandas series contains a string from a list using apply?

Scalable solution for str.contains with list of strings in pandas

but I don't think these solve the above.

Is anyone able to explain how I am going wrong?

1
  • Your code works for me -- the only thing you need to do is pass regex=False a s. is a regex character meaning the second row would be True since there is a z in it. There is nothing wrong with the way you are doing the loop, but I think my way may be slightly more pythonic. Commented Dec 10, 2020 at 23:18

1 Answer 1

1

You can loop through the lists simultaneously with zip. Make sure to pass regex=False to str.contains as . is a regex character.

abbreviation=['n.', 'v.']
col_name=['Noun','Verb']
for a, col in zip(abbreviation, col_name):
    Blaze[col] = np.where(Blaze['Info'].str.contains(a, regex=False),True,False)
Blaze
Out[1]: 
        Word                                               Info  Noun   Verb
0        Aam  Aam, n. Etym: [D. aam, fr. LL. ama; cf. L. ham...  True  False
1  aard-vark     Aard"-vark`, n. Etym: [D., earth-pig.] (Zoöl.)  True  False
2  aard-wolf      Aard"-wolf`, n. Etym: [D, earth-wolf] (Zoöl.)  True  False

If required, str.contains also has a case parameter, so you can specify case=False to search case-insensitively.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.