Replacing newlines with spaces for str columns through pandas dataframe

Question

Given an example dataframe with the 2nd and 3rd columns of free text, e.g.

>>> import pandas as pd
>>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]
>>> pd.DataFrame(lol)
   0  1          2          3
0  1  2        abc   foo\nbar
1  3  1  def\nhaha  love it\n

The goal is to replace the \n to (whitespace) and strip the string in column 2 and 3 to achieve:

>>> pd.DataFrame(lol)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

How to replace newlines with spaces for specific columns through pandas dataframe?

I have tried this:

>>> import pandas as pd
>>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]

>>> replace_and_strip = lambda x: x.replace('\n', ' ').strip()

>>> lol2 = [[replace_and_strip(col) if type(col) == str else col for col in list(row)] for idx, row in pd.DataFrame(lol).iterrows()]

>>> pd.DataFrame(lol2)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

But there must be a better/simpler way.

jezrael · Accepted Answer · 2017-10-02 09:39:19Z

18

Use replace - first first and last strip and then replace \n:

df = df.replace({r'\s+$': '', r'^\s+': ''}, regex=True).replace(r'\n',  ' ', regex=True)
print (df)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

edited Oct 2, 2017 at 9:39

answered Oct 2, 2017 at 9:29

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Wiktor Stribiżew · Accepted Answer · 2017-10-02 09:26:13Z

2

You may use the following two regex replace approach:

>>> df.replace({ r'\A\s+|\s+\Z': '', '\n' : ' '}, regex=True, inplace=True)
>>> df
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it
>>>

Details

'\A\s+|\s+\Z' -> '' will act like strip() removing all leading and trailing whitespace:
- \A\s+ - matches 1 or more whitespace symbols at the start of the string
- | - or
- \s+\Z - matches 1 or more whitespace symbols at the end of the string
'\n' -> ' ' will replace any newline with a space.

answered Oct 2, 2017 at 9:26

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Comments

zipa · Accepted Answer · 2017-10-02 09:27:24Z

1

You can select_dtypes to select columns of type object and use applymap on those columns.

Because there is no inplace argument for these functions, this would be a workaround to make change to the dataframe:

strs = lol.select_dtypes(include=['object']).applymap(lambda x: x.replace('\n', ' ').strip())
lol[strs.columns] = strs
lol
#   0  1         2        3
#0  1  2       abc  foo bar
#1  3  1  def haha  love it

answered Oct 2, 2017 at 9:27

zipa

28k6 gold badges45 silver badges62 bronze badges

Comments

Mohamed Ali JAMAOUI · Accepted Answer · 2017-10-02 09:29:18Z

1

Adding to the other nice answers, this is a vectorized version of your initial idea:

columns = [2,3] 
df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                       for col in columns]

Details:

In [49]: df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                                 for col in columns]  

In [50]: df
Out[50]: 
   0  1        2         3
0  1  2      abc  def haha
1  3  1  foo bar   love it

answered Oct 2, 2017 at 9:29

Mohamed Ali JAMAOUI

14.8k14 gold badges79 silver badges124 bronze badges

Collectives™ on Stack Overflow

Replacing newlines with spaces for str columns through pandas dataframe

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related