I am trying to extract some data from a JSON file which contains tweets and write it to a csv. The file contains all kinds of characters, I'm guessing this is why i get this error message:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026'
I guess I have to convert the output to utf-8 before writing the csv file, but I have not been able to do that. I have found similar questions here on stackoverflow, but not I've not been able to adapt the solutions to my problem (I should add that I am not really familiar with python. I'm a social scientist, not a programmer)
import csv
import json
fieldnames = ['id', 'text']
with open('MY_SOURCE_FILE', 'r') as f, open('MY_OUTPUT', 'a') as out:
writer = csv.DictWriter(
out, fieldnames=fieldnames, delimiter=',', quoting=csv.QUOTE_ALL)
for line in f:
tweet = json.loads(line)
user = tweet['user']
output = {
'text': tweet['text'],
'id': tweet['id'],
}
writer.writerow(output)
import codes with codecs.open('MY_SOURCE_FILE', 'r', encoding='utf-8') as f, codecs.open('MY_OUTPUT', 'a', encoding='utf-8') as out:the codecs module will handle the decoding and encoding for you