0

Goal: Remove items from my list, strings_2_remove, from a series. I have a list of strings like so:

strings_2_remove = [
"dogs are so cool",
"cats have cute toe beans"
]

I also have a series of strings that looks like this:

df.Sentences.head()

0    dogs are so cool because they are nice and funny 
1    many people love cats because cats have cute toe beans
2    hamsters are very small and furry creatures
3    i got a dog because i know dogs are so cool because they are nice and funny
4    birds are funny when they dance to music, they bop up and down
Name: Summary, dtype: object

The outcome after removing the strings in the list from the series should look like this:

    0    because they are nice and funny 
    1    many people love cats because 
    2    hamsters are very small and furry creatures
    3    i got a dog because i know because they are nice and funny
    4    birds are funny when they dance to music, they bop up and down
    Name: Summary, dtype: object

I have the following in attempt to achieve the output I want:

mask_1 = (df.Sentences == strings_2_remove)
df.loc[mask_1, 'df.Sentences'] = " "

However, it is not achieving my goal.

Any suggestions?

4 Answers 4

1

Try:

result = df.Sentences
for stringToRemove in strings_2_remove:
    result = result.replace(stringToRemove, '', regex=False)

There are better, more performant solutions using RegEx. More information here.

Sign up to request clarification or add additional context in comments.

Comments

1

Use Series.replace:

df.Sentences.replace('|'.join(strings_2_remove), '', regex=True)

0                      because they are nice and funny
1                       many people love cats because 
2          hamsters are very small and furry creatures
3    i got a dog because i know  because they are n...
4    birds are funny when they dance to music, they...
Name: Sentences, dtype: object

5 Comments

hey do you know the cause of this unsynchronized strings ? i got the same output
It's just pandas display settings - default is right-aligned. There is no whitespace padding or anything
about whitespace i knew that.. hmm how can i change this for a better display ?
back at you buddy
1
df.Sentences.apply(lambda x: re.sub('|'.join(strings_2_remove),'',x))

Comments

0

I created the test Dataframe as:

df = pd.DataFrame({ 'Summary':[
    'dogs are so cool because they are nice and funny',
    'many people love cats because cats have cute toe beans',
    'hamsters are very small and furry creatures',
    'i got a dog because i know dogs are so cool because they are nice and funny',
    'birds are funny when they dance to music, they bop up and down']})

The first step is to convert your strings_2_remove to a list of patterns (you have to import re):

pats = [ re.compile(str + ' *') for str in strings_2_remove ]

Note that each pattern is supplemented with ' *' - an optional space. Otherwise the result string could contain two adjacent spaces. As I see, other solution missed on this detail.

Then define a function to be applied:

def fn(txt):
    for pat in pats:
        if pat.search(txt):
            return pat.sub('', txt)
    return txt

For each pattern it searches the source string and if something has been found then returns the result of substitution of the matched string with an empty string. Otherwise it returns the source string.

And the only thing to do is to apply this function:

df.Summary.apply(fn)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.