0

Background:

I have the following code to make a dataframe from a list:

l = ['the cat meows',
     'the dog barks',
     'the bird chirps']
df = pd.DataFrame(l, columns=['Text'])

Output:

          Text
0   the cat meows
1   the dog barks
2   the bird chirps

Desired Output:

          Text     Animal   
0   the cat meows   cat
1   the dog barks   dog
2   the bird chirps bird

Approach:

I attempt to get the Desired Output using the following code:

#create list of animal names
animal_list = ['cat', 'dog', 'bird']

#extract names from 'Text' column using the names in 'animal_list' 
#and create a new column containing extracted 'Text' names
df['Sound'] = df['Animal'].str.extract(r"(%s)"% animal_list)

Problem:

However, I get the following when I do so:

            Text    Animal
0   the cat meows   t
1   the dog barks   t
2   the bird chirps t

Question

How do I achieve my desired output?

3
  • What is the logic here. Do we need to use your animal_list or is the middle word everytime? Commented May 27, 2019 at 22:16
  • Sorry if it was unclear. My goals are the following: 1) extract names from 'Text' column 2) using the names in 'animal_list' 3) create a new column containing extracted 'Text' names Commented May 27, 2019 at 22:40
  • And yes, the words in animal_list are needed Commented May 27, 2019 at 22:43

1 Answer 1

2

Using the animal_list with str.extract

We can use Series.str.extract here and pass it your animal_list delimited by a | which is the or operator in regex:

df['Animal'] = df['Text'].str.extract(f"({'|'.join(animal_list)})")

Or if you have python < 3.5 you cannot use f-string

We can use @Mike's answer from the comments

df['Animal'] = df['Animal'].str.extract(r"({})".format("|".join(animal_list)))

Output

              Text Animal
0    the cat meows    cat
1    the dog barks    dog
2  the bird chirps   bird

Getting the middle word with str.split

df['Animal'] = df['Text'].str.split().str[1]

Output

              Text Animal
0    the cat meows    cat
1    the dog barks    dog
2  the bird chirps   bird
Sign up to request clarification or add additional context in comments.

2 Comments

Sniped me! df['Sound'] = df['Animal'].str.extract(r"({})".format("|".join(animal_list)))
Thanks for the addition, added your solution in the answer as well @Mike