5

I have a DataFrame where a column is filled with strings. I want to remove any appearance of single letters from the column. So far, I have tried:

df['STRI'] = df['STRI'].map(lambda x: " ".join(x.split() if len(x) >1)

I wish to input ABCD X WYZ and get ABCD WYZ.

1
  • 2
    Your check is about the whole string. Do it for each word: df['STRI'].map(lambda x: ' '.join(word for word in x.split() if len(word)>1)) Although probably there are better ways of doing this. Commented Jan 19, 2017 at 7:27

3 Answers 3

5

Try this:

df['STRI'] = npi['STRI'].str.replace(r'\b\w\b', '').str.replace(r'\s+', ' ')

Eg:

import pandas as pd

df = pd.DataFrame(data=['X ABCD X X WEB X'], columns=['c1'])
print df, '\n'
df.c1 = df.c1.str.replace(r'\b\w\b', '').str.replace(r'\s+', ' ')
print df

Output:

                 c1
0  X ABCD X X WEB X 

           c1
0   ABCD WEB 
Sign up to request clarification or add additional context in comments.

6 Comments

This does not generalize, as the original question asks for removing any single characters.
Try again. Thanks @piRSQuared.
Tried again after your edit, but still doesn't work.
Can you include npi.head() and df.head() ?
@piRSquared This will not take care of edge cases.
|
4

You can use str.replace and regex. The pattern \b\w\b will replace any single word character with a word boundary. See working example below:

Example using series:

s = pd.Series(['Katherine','Katherine and Bob','Katherine I','Katherine', 'Robert', 'Anne', 'Fred', 'Susan', 'other'])

   s.str.replace(r'\b\w\b','').str.replace(r'\s+', ' ')

0            Katherine
1    Katherine and Bob
2            Katherine
3            Katherine
4               Robert
5                 Anne
6                 Fred
7                Susan
8                other
dtype: object

Another example with your test data:

    s = pd.Series(['ABCD','X','WYZ'])

0    ABCD
1       X
2     WYZ
dtype: object

s.str.replace(r'\b\w\b','').str.replace(r'\s+', ' ')

0    ABCD
1        
2     WYZ
dtype: object

With your data it is:

df['STRI'].str.replace(r'\b\w\b','').str.replace(r'\s+', ' ')

1 Comment

.strip() will replace only front and end spaces. In between spaces will be left out.
3

list comprehension

[
    ' '.join([i for i in s.split() if len(i) > 1])
    for s in npi.STRI.values.tolist()
]

str.split

s = npi.STRI.str.split(expand=True).stack()
s[s.str.len() > 1].groupby(level=0).apply(' '.join)

2 Comments

.str.replace().str.replace() will be efficient?
@MYGz use an apply and embed both replaces in the same apply

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.