0

I am using the following to convert the response result to object:

response = requests.get(url=request_url)
myobjs = json.loads(response.text, object_hook=lambda d: Myobj(**d))
return myobjs

and

class Myobj(object):
    def __init__(self, id, display):
        self.id = str(id)
        self.name = str(display)

Sample JSON:

[
    {
        "id": "92cbb711-7e4d-417a-9530-f1850d9bc687",
        "display": "010lf.com",
    },
    {
        "id": "1060864a-a3a5-40c2-aa94-651fe2d10ae9",
        "display": "010lm.com",
    }
]

It works well until one day, one of the field display in the returned JSON contains unicode value for example:

"display": "관악저널.kr"

It will give the below error:

File "mycode.py", line 5, in __init__
    self.name = str(display)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

I would have thought the str() function would handle the unicode code string properly.

What is that I am missing?

I try to change the line from

self.name = str(display) 

to

self.name = display

It seems to do the trick but I wanna check if I am doing it correctly and efficiently?

8
  • Are you using Python 2.7? Commented Sep 7, 2017 at 23:45
  • 2
    str() will work for ascii characters that happened to have been stored in unicode strings. This means that self.name is now a Unicode object, not a str object. It will work, but only if your other code handles Unicode objects. Commented Sep 7, 2017 at 23:45
  • 2
    @StevenYong If you switch to 3.x, all these problems will go away. Commented Sep 7, 2017 at 23:46
  • 1
    @StevenYong in Python 3, all strings can be unicode so you never have to worry about this stuff. Commented Sep 7, 2017 at 23:53
  • 1
    You can use str(display.encode('utf-8', errors='ignore')) But you may lose data that way, if you are getting Korean characters. Commented Sep 7, 2017 at 23:55

1 Answer 1

1

json returns the strings as Unicode. So either store them as Unicode (the correct solution) or encode them in UTF-8. Note that str() converts Unicode strings to bytes strings with the ascii codec, so doesn't work with non-ASCII-only Unicode strings.

#!python2
#coding:utf8
import json

text = '''\
[
    {
        "id": "92cbb711-7e4d-417a-9530-f1850d9bc687",
        "display": "관악저널.kr"
    },
    {
        "id": "1060864a-a3a5-40c2-aa94-651fe2d10ae9",
        "display": "010lm.com"
    }
]'''

class Myobj(object):
    def __init__(self, id, display):
        self.id = id # or id.encode('utf8')
        self.name = display # or display.encode('utf8')
    def __repr__(self):
        return 'MyObj({self.id!r},{self.name!r})'.format(self=self)

myobjs = json.loads(text, object_hook=lambda d: Myobj(**d))
print(myobjs)

Output:

[MyObj(u'92cbb711-7e4d-417a-9530-f1850d9bc687',u'\uad00\uc545\uc800\ub110.kr'),
 MyObj(u'1060864a-a3a5-40c2-aa94-651fe2d10ae9',u'010lm.com')]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.