0

I am saving my data into a dictionary and after saving it to the dictionary I printed the data to see what it looks like and I see the unicode:

(u'520775', [[u'Kategori:2. divisjon fotball for herrer 2008']])
(u'754686', [[u'Kategori:Debutalbum', u'Kategori:Musikkalbum fra 1990', u'Kategori:Tre Sm\xe5 Kinesere-album']])
(u'381191', [[u'Kategori:Serierundene i Adeccoligaen 2007']])
(u'972597', [[u'Kategori:Tippeligaen 2011']])
(u'263001', [[u'Kategori:Musikkalbum fra 2003']])
(u'23037', [[u'Kategori:Luftforsvaret']])
(u'640060', [[u'Kategori:Deltagermedaljen', u'Kategori:F\xf8dsler i 1923', u'Kategori:Norske folkemusikere', u'Kategori:Norske trekkspillere', u'Kategori:Paul Harris Fellow', u'Kategori:Personer fra Vefsn kommune']])

I have the following code, I used the format option but it didn't really work. What also confuses me is,when I print the id prior to saving it in dictionary, I see it without integer.

Here is the segment of the code,

for (pageId, pageData) in data['query']['pages'].iteritems():
            categoryTitles = [];
            idTitleDictionary[pageId] = [];
            print pageId;
            try:
                for category in pageData['categories']:
                    categoryTitles.append(category['title']);
                idTitleDictionary[format(pageId)].append(categoryTitles);

I am trying it figure how to encode it prior to saving it into a dictionary.

0

1 Answer 1

1

When you print a dict, or list, or tuple, repr is called on the items in the container, rather than str like when you print them directly, so you see the unicode escape codes.

If you were to

mydict = dict(((u'520775', [[u'Kategori:2. divisjon fotball for herrer 2008']]),
(u'754686', [[u'Kategori:Debutalbum', u'Kategori:Musikkalbum fra 1990', 
              u'Kategori:Tre Sm\xe5 Kinesere-album']]),
(u'381191', [[u'Kategori:Serierundene i Adeccoligaen 2007']]),
(u'972597', [[u'Kategori:Tippeligaen 2011']]),
(u'263001', [[u'Kategori:Musikkalbum fra 2003']]),
(u'23037', [[u'Kategori:Luftforsvaret']]),
(u'640060', [[u'Kategori:Deltagermedaljen', u'Kategori:F\xf8dsler i 1923', 
              u'Kategori:Norske folkemusikere', 
              u'Kategori:Norske trekkspillere', u'Kategori:Paul Harris Fellow', 
              u'Kategori:Personer fra Vefsn kommune']])))

for key, value in mydict.iteritems():
    print key,
    for elem in value[0]:
        print elem + ',',
    print

You'd see the strings encoded properly for your terminal. You don't need to do anything to those strings to interpret the escape codes -- everything is stored properly, it's just how it's being displayed.

Sign up to request clarification or add additional context in comments.

2 Comments

So this means I can go on with my business because I plan to use the data in the dictionary to make some sql queries. These data were collected from an api. Thats why I was concerned.
Thanks mate I tried a simple example on the terminal great, and thanks for the excellent fast answer, here is my example: >>> mydict = {} >>> pageId = 12345 >>> mydict[pageId] = [] >>> mydict {12345: []} >>> mydict[pageId].append(['Category1', 'Category2'] ... )

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.