Remove single letters from strings in Pandas dataframe

Question

I have a DataFrame where a column is filled with strings. I want to remove any appearance of single letters from the column. So far, I have tried:

df['STRI'] = df['STRI'].map(lambda x: " ".join(x.split() if len(x) >1)

I wish to input ABCD X WYZ and get ABCD WYZ.

Your check is about the whole string. Do it for each word: df['STRI'].map(lambda x: ' '.join(word for word in x.split() if len(word)>1)) Although probably there are better ways of doing this. — user2285236
– user2285236, Commented Jan 19, 2017 at 7:27

Mohammad Yusuf · Accepted Answer · 2017-01-19 07:53:04Z

5

Try this:

df['STRI'] = npi['STRI'].str.replace(r'\b\w\b', '').str.replace(r'\s+', ' ')

Eg:

import pandas as pd

df = pd.DataFrame(data=['X ABCD X X WEB X'], columns=['c1'])
print df, '\n'
df.c1 = df.c1.str.replace(r'\b\w\b', '').str.replace(r'\s+', ' ')
print df

Output:

                 c1
0  X ABCD X X WEB X 

           c1
0   ABCD WEB

edited Jan 19, 2017 at 7:53

answered Jan 19, 2017 at 7:26

Mohammad Yusuf

17.1k12 gold badges60 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

infinite-rotations Over a year ago

This does not generalize, as the original question asks for removing any single characters.

Mohammad Yusuf Over a year ago

Try again. Thanks @piRSQuared.

infinite-rotations Over a year ago

Tried again after your edit, but still doesn't work.

Mohammad Yusuf Over a year ago

Can you include npi.head() and df.head() ?

Mohammad Yusuf Over a year ago

@piRSquared This will not take care of edge cases.

|

nipy · Accepted Answer · 2017-01-19 07:57:07Z

4

You can use str.replace and regex. The pattern \b\w\b will replace any single word character with a word boundary. See working example below:

Example using series:

s = pd.Series(['Katherine','Katherine and Bob','Katherine I','Katherine', 'Robert', 'Anne', 'Fred', 'Susan', 'other'])

   s.str.replace(r'\b\w\b','').str.replace(r'\s+', ' ')

0            Katherine
1    Katherine and Bob
2            Katherine
3            Katherine
4               Robert
5                 Anne
6                 Fred
7                Susan
8                other
dtype: object

Another example with your test data:

    s = pd.Series(['ABCD','X','WYZ'])

0    ABCD
1       X
2     WYZ
dtype: object

s.str.replace(r'\b\w\b','').str.replace(r'\s+', ' ')

0    ABCD
1        
2     WYZ
dtype: object

With your data it is:

df['STRI'].str.replace(r'\b\w\b','').str.replace(r'\s+', ' ')

edited Jan 19, 2017 at 7:57

answered Jan 19, 2017 at 7:35

nipy

5,5485 gold badges37 silver badges84 bronze badges

1 Comment

Mohammad Yusuf Over a year ago

.strip() will replace only front and end spaces. In between spaces will be left out.

piRSquared · Accepted Answer · 2017-01-19 07:46:46Z

3

list comprehension

[
    ' '.join([i for i in s.split() if len(i) > 1])
    for s in npi.STRI.values.tolist()
]

str.split

s = npi.STRI.str.split(expand=True).stack()
s[s.str.len() > 1].groupby(level=0).apply(' '.join)

answered Jan 19, 2017 at 7:46

piRSquared

296k68 gold badges509 silver badges654 bronze badges

2 Comments

Mohammad Yusuf Over a year ago

.str.replace().str.replace() will be efficient?

piRSquared Over a year ago

@MYGz use an apply and embed both replaces in the same apply

Collectives™ on Stack Overflow

Remove single letters from strings in Pandas dataframe

3 Answers 3

6 Comments

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related