I scrapped a webpage with BeautifulSoup. I got great output except parts of the list look like this after getting the text:
list = [u'that\\u2019s', u'it\\u2019ll', u'It\\u2019s', u'don\\u2019t', u'That\\u2019s', u'we\\u2019re', u'\\u2013']
My question now is how to get rid or replace these double backslashes with the special characters they are.
If i print the first the first element of the example list the output looks like
print list[0]
that\u2019s
I already read a lot of other questions / threads about this topic but I ended up being even more confused, as I am a beginner considering unicode / encoding / decoding.
I hope that someone could help me with this issue.
Thanks! MG
json.loadsand access the pieces of it you want from there.data = json.load(name_of_file)and then I only got the stuff I want withraw = data['html'].I assume that the next step where I tried to get rid of comments (still got some left after using BeautifulSoup in some cases) withraw = re-sub('(?s)<!--.*?-->', '',str(raw))got my output messy.