2

I have a df, with a column that contains a list. for example -

df = pd.DataFrame({'name': ['name1', 'name2', 'name3', 'name4'],
                   'age': [21, 23, 24, 28],
                   'occupation': ['data scientist',  'doctor',  'data analyst', 'engineer'],
                   'knowledge':[['python','c++'], ['python', 'c#'], ['css','js','html'], ['c#']],
                  })

now, I want to locate only the rows with 'python' as one of the 'knowledge' values in the list. how do I do that?

I tried to do: pd.loc[(pd['knowledge'].isin['python'])] and it didn't work

(edited to fix the code)

2 Answers 2

4

You need to use a loop:

df[['python' in l for l in df['knowledge']]]

output:

    name  age      occupation      knowledge
0  name1   21  data scientist  [python, c++]
1  name2   23          doctor   [python, c#]

alternatives

finding any element of a set

keep rows with at least one match

search = set(['python', 'js'])
df[[bool(search.intersection(l)) for l in df['knowledge']]]

output:

    name  age      occupation        knowledge
0  name1   21  data scientist    [python, c++]
1  name2   23          doctor     [python, c#]
2  name3   24    data analyst  [css, js, html]
matching all elements of a set

all elements need to match

search = set(['python', 'c++'])
df[[search <= set(l) for l in df['knowledge']]]

output:

    name  age      occupation      knowledge
0 name1   21  data scientist  [python, c++]

Sign up to request clarification or add additional context in comments.

3 Comments

@NightHawk you can accept this answer as the correct answer so future visitors will see by a glance that the problem was solved and they are at the right place if their issue is similar to yours.
@LajosArpad Could you help with stackoverflow.com/questions/71912044/…
@NightHawk unfortunately no, I'm not very experienced with Python.
0

You can try to join the list into space separated value, then find it contains your wanted word with word boundry.

m = df['knowledge'].str.join(' ').str.contains(r'\bpython\b')

Or you can try Series.apply

m = df['knowledge'].apply(lambda l: 'python' in l)
print(m)

0     True
1     True
2    False
3    False
Name: knowledge, dtype: bool

The use boolean indexing to select the True rows

print(df[m])

    name  age      occupation      knowledge
0  name1   21  data scientist  [python, c++]
1  name2   23          doctor   [python, c#]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.