Python: replace string in the dataframe/column if only 1 word in the row

Question

I have pretty messy data I am trying to replace rows that might contain only 1 word or string with '' or empty string.

Here is the original data:

df = pd.DataFrame({'some_text': [
        'I enjoy read Mark Twain\'s Books',
        'Library is very useful',
        '/',
        '\\',
        '/ /',
        '',
        'I enjoy read Mark Twain\'s Books',
        'an',
        'the',
        'Books are interesting'
]})

I tried this: ( this is dropping rows). I don't want to drop the rows just replace it.

count = df['some_text'].str.split().str.len()
df[~(count==1)]

Final output needed:

I enjoy read Mark Twain's Books
Library is very useful


/ /

I enjoy read Mark Twain's Books


Books are interesting

Paolo · Accepted Answer · 2018-12-03 18:15:35Z

2

You can use a simple regex here:

df['new_text'] = df['some_text'].str.replace('^\S+$','');
>>> df
                         some_text                         new_text
0  I enjoy read Mark Twain's Books  I enjoy read Mark Twain's Books
1           Library is very useful           Library is very useful
2                                /                                 
3                                \                                 
4                              / /                              / /
5                                                                  
6  I enjoy read Mark Twain's Books  I enjoy read Mark Twain's Books
7                               an                                 
8                              the                                 
9            Books are interesting            Books are interesting

answered Dec 3, 2018 at 18:15

Paolo

26.7k8 gold badges51 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

lxop Over a year ago

Note that this regex will not replace strings that have only one word but which also have leading or trailing whitespace, though it could be modified to do so if desired.

Daniel Fonnegra García · Accepted Answer · 2018-12-03 18:20:45Z

2

With the implementation you made, instead of drop the rows, asign a new value like this:

count = df['some_text'].str.split().str.len()
df[count == 1] = ""

answered Dec 3, 2018 at 18:20

Daniel Fonnegra García

4022 silver badges11 bronze badges

Comments

BernardL · Accepted Answer · 2018-12-03 18:13:53Z

You can apply the transformation to the column without a mask:

df['replaced_text'] = df['some_text'].apply(lambda x: '' if len(x.strip().split()) == 1  else x) 
print(df.to_string())
df
>>

                         some_text                    replaced_text
0  I enjoy read Mark Twain's Books  I enjoy read Mark Twain's Books
1           Library is very useful           Library is very useful
2                                /                                 
3                                \                                 
4                              / /                              / /
5                                                                  
6  I enjoy read Mark Twain's Books  I enjoy read Mark Twain's Books
7                               an                                 
8                              the                                 
9            Books are interesting            Books are interesting

Very similar to what you have applied, the lambda function checks each string with whitespaces stripped which have length equals 1 and replace it with ''.

Collectives™ on Stack Overflow

Python: replace string in the dataframe/column if only 1 word in the row

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related