replacing strings within a pandas data frame column

Question

I have a pandas data frame with a column named "content" that contains text. I want to remove some words from each text within this column. I thought of replacing each string by empty string, but when I print the result of my function I see that the words have not been removed. My code is below:

def replace_words(t):
  words = ['Livre', 'Chapitre', 'Titre', 'Chapter', 'Article' ]
  for i in t:
    if i in words:
      t.replace (i, '')
    else:
      continue
  print(t)


st = 'this is Livre and Chapitre and Titre and Chapter and Article'

replace_words(st)

An example of desired result is: 'this is and and and and '

With the code below I want to apply the function above to each text in the column "content":

df['content'].apply(lambda x: replace_words(x))

Can someone help me to create a function that removes all the words I need and then apply this function to all the texts within my df column?

Rabinzel · Accepted Answer · 2022-10-24 12:42:54Z

2

You can use str.replace.
Input:

df = pd.DataFrame({
    'ID' : np.arange(4),
    'words' : ['this is Livre and Chapitre and Titre and Chapter and Article', 
               'this is car and Chapitre and bus and Chapter and Article',
              'this is Livre and Chapitre',
              'nothing to replace']
})

words = ['Livre', 'Chapitre', 'Titre', 'Chapter', 'Article']
pat = '|'.join(map(re.escape, words))
print(pat)
'Livre|Chapitre|Titre|Chapter|Article'

df['words'] = df['words'].str.replace(pat, '', regex=True)
print(df)

   ID                               words
0   0        this is  and  and  and  and 
1   1  this is car and  and bus and  and 
2   2                       this is  and 
3   3                  nothing to replace

edited Oct 24, 2022 at 12:42

answered Oct 24, 2022 at 12:10

Rabinzel

7,9533 gold badges12 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

mozway Over a year ago

much better: df['words'] = df['words'].str.replace(pat, '', regex=True)

Fabian Pino Over a year ago

If you are using stopwords including a, e, i, o, u or syllables, str.replace removes those letters or syllables inside words. I had that problem and solved it using: df['words'] = df['words'].apply(lambda x: ' '.join([word for word in x.split() if word not in (words)]))

Firefighting Physicist · Accepted Answer · 2022-10-24 12:36:41Z

1

Two problems:

If you split using for i in t: each i is a letter, not a word.
t.replace does not work inplace

Use this:

def replace_words(t):
    words = ['Livre', 'Chapitre', 'Titre', 'Chapter', 'Article' ]
    for i in t.split(' '):
        # print(i) # remove to see problem 1
        if i in words:
            t= t.replace (i, '')
        else:
            continue
    # print(t)
    return t

Edit: You can directly call df['col'].apply(replace_words).

edited Oct 24, 2022 at 12:36

answered Oct 24, 2022 at 12:00

Firefighting Physicist

4052 gold badges6 silver badges15 bronze badges

3 Comments

ForeverLearner Over a year ago

ok, the function works perfectly but after applying the function to the column using df['col'].apply(replace_words) I don't see the words replaced in texts of the columns

Carmoreno Over a year ago

did you test returning the t variable in the final of the function?

Firefighting Physicist Over a year ago

Exactly, you have to return t and not just print it

Collectives™ on Stack Overflow

replacing strings within a pandas data frame column

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related