19

I've been struggling with this one for a while. I'm trying to write strings to HTML but have issues with the format once I've cleaned them. Here's an example:

paragraphs = ['Grocery giant and household name Woolworths is battered and bruised. ', 
'But behind the problems are still the makings of a formidable company']

x = str(" ")
for item in paragraphs:
    x = x + str(item)
x

Output:

"Grocery giant and household name\xc2\xa0Woolworths is battered and\xc2\xa0bruised. 
But behind the problems are still the makings of a formidable\xc2\xa0company"

Desired output:

"Grocery giant and household name Woolworths is battered and bruised. 
But behind the problems are still the makings of a formidable company"

I'm hoping you're able to explain why this happens and how I can fix. Thanks in advance!

1
  • 2
    Have you checked for unusual Unicode whitespace in your source string? Commented Sep 6, 2015 at 2:52

1 Answer 1

34

\xc2\xa0 means 0xC2 0xA0 is so-called

Non-breaking space

It is a kind of invisible control character in UTF-8 encodings. More info about it check the wikipedia: https://en.wikipedia.org/wiki/Non-breaking_space

I copied what you have pasted in the questions and got the expected output.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. That fixes it. I built in: x.replace("\xc2\xa0", " ")

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.