I need to parse various text sources and then print / store it somewhere.
Every time a non ASCII character is encountered, I can't correctly print it as it gets converted to bytes, and I have no idea how to view the correct characters.
(I'm quite new to Python, I come from PHP where I never had any utf-8 issues)
The following is a code example:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import codecs
import feedparser
url = "http://feeds.bbci.co.uk/japanese/rss.xml"
feeds = feedparser.parse(url)
title = feeds['feed'].get('title').encode('utf-8')
print(title)
file = codecs.open("test.txt", "w", "utf-8")
file.write(str(title))
file.close()
I'd like to print and write in a file the RSS title (BBC Japanese - ホーム) but instead the result is this:
b'BBC Japanese - \xe3\x83\x9b\xe3\x83\xbc\xe3\x83\xa0'
Both on screen and file. Is there a proper way to do this ?