3

In python, how to check if a string is an element of a list of strings?

The example data I am working with is :

testData=pd.DataFrame({'value':['abc','cde','fgh']})

Then why the result of the following code is "False":

testData['value'][0] in testData['value']
8
  • Sorry the data will be stored as a Series containing individual strings in your sample df but is your real df data really a list of strings for each row? As that is fundamentally different Commented Oct 28, 2016 at 14:11
  • @EdChum answer is a good one. To help fix your original error, you simply need to check the values of testData['value'] so your last line will be 'testData['value'][0] in testData['value'].values' and you will get a True Commented Oct 28, 2016 at 14:14
  • @EdChum, I guess the example data is a more accurate description of my problem. The fundamental difference you mentioned might be the thing I overlooked. Commented Oct 28, 2016 at 14:16
  • actually testData['value'][0] in testData['value'] I can't explain, somehow when the scalar value is the lhs it's somehow able to evaluate the Series array into a scalar boolean which is weird Commented Oct 28, 2016 at 14:19
  • I found the answer to your last question Commented Oct 28, 2016 at 14:28

1 Answer 1

5

You can use the vectorised str.contains to test if a string is present/contained in each row :

In [262]:
testData['value'].str.contains(testData['value'][0])

Out[262]:
0     True
1    False
2    False
Name: value, dtype: bool

If you're after whether it's present in any row then use any:

In [264]:
testData['value'].str.contains(testData['value'][0]).any()

Out[264]:
True

OK to address your last question:

In [270]:
testData['value'][0] in testData['value']

Out[270]:
False

This is because pd.Series.__contains__ is implemented:

def __contains__(self, key):
    """True if the key is in the info axis"""
    return key in self._info_axis

If we look at what _info_axis actually is:

In [269]:
testData['value']._info_axis

Out[269]:
RangeIndex(start=0, stop=3, step=1)

Then we can see when we do 'abc' in testData['value'] we're really testing whether 'abc' is actually in the index which is why it returns False

Example:

In [271]:
testData=pd.DataFrame({'value':['abc','cde','fgh']}, index=[0, 'turkey',2])
testData

Out[271]:
       value
0        abc
turkey   cde
2        fgh

In [272]:
'turkey' in testData['value']

Out[272]:
True

We can see that is returns True now because we're testing if 'turkey' is present in the index

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.