26

I have a similar problem to the one posted here:

Pandas DataFrame: remove unwanted parts from strings in a column

I need to remove newline characters from within a string in a DataFrame. Basically, I've accessed an api using python's json module and that's all ok. Creating the DataFrame works amazingly, too. However, when I want to finally output the end result into a csv, I get a bit stuck, because there are newlines that are creating false 'new rows' in the csv file.

So basically I'm trying to turn this:

'...this is a paragraph.

And this is another paragraph...'

into this:

'...this is a paragraph. And this is another paragraph...'

I don't care about preserving any kind of '\n' or any special symbols for the paragraph break. So it can be stripped right out.

I've tried a few variations:

misc['product_desc'] = misc['product_desc'].strip('\n')

AttributeError: 'Series' object has no attribute 'strip'

here's another

misc['product_desc'] = misc['product_desc'].str.strip('\n')

TypeError: wrapper() takes exactly 1 argument (2 given)

misc['product_desc'] = misc['product_desc'].map(lambda x: x.strip('\n'))
misc['product_desc'] = misc['product_desc'].map(lambda x: x.strip('\n\t'))

There is no error message, but the newline characters don't go away, either. Same thing with this:

misc = misc.replace('\n', '')

The write to csv line is this:

misc_id.to_csv('C:\Users\jlalonde\Desktop\misc_w_id.csv', sep=' ', na_rep='', index=False, encoding='utf-8')

Version of Pandas is 0.9.1

Thanks! :)

2 Answers 2

48

strip only removes the specified characters at the beginning and end of the string. If you want to remove all \n, you need to use replace.

misc['product_desc'] = misc['product_desc'].str.replace('\n', '')
Sign up to request clarification or add additional context in comments.

3 Comments

is this inplace?
@user1767754: It modifies the original DataFrame, if that's what you mean. It's not strictly "in place" though; it creates a new column with the modified values and then assigns it back, overwriting the original column.
.str is important, missed it at first glance.
7

You could use regex parameter of replace method to achieve that:

misc['product_desc'] = misc['product_desc'].replace(to_replace='\n', value='', regex=True)

4 Comments

If product_desc may contain mixed values (e.g. float, str...) then convert it to np.str to work properly: misc['product_desc'] = misc['product_desc'].astype(np.str).replace(to_replace='\n', value='', regex=True). Otherwise only str values will be replaced...
to_replace can use a list, too: .replace(to_replace=['\n', '\t'], value='', regex=True)
How do you replace items that are within a word? Example: 'This is a sente\tnce'. (Remove \t)
@ArthurD.Howland code from the answer should work for that cases.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.