3

I have a unicode object which should represent a json but it contains the unicode u in it as part of the string value e.g. u'{u\'name\':u\'my_name\'}'

My goal is to be able to load this into a json object. Just using json.loads fails. I know this happens because of the u inside the string which are not part of an acceptable json format.

I, then, tired sanitizing the string using replace("u\'", "'"), encode('ascii', 'ignore') and other methods without success.

What finally worked was using ast.literal_eval but I'm worried about using it. I found a few sources online claiming its safe. But, I also found other sources claiming it's bad practice and one should avoid it.

Are there other methods I'm missing?

5
  • 1
    ast.literal_eval is safe. Commented Jan 6, 2019 at 16:53
  • @dawg thanks for the quick response. Is it preferable to try and replace the u values etc.? Does ast.literal_eval have any negative downsides to it? Commented Jan 6, 2019 at 16:59
  • 2
    Does ast.literal_eval have any negative downsides to it? No, not really. If it works for your data (other than the example here) - use it. Commented Jan 6, 2019 at 17:04
  • Thanks for the assist @dawg Commented Jan 6, 2019 at 17:06
  • @GuyGrin, I have updated my answer with 1 more method. Please check, it as it allows you to use json module. Note that the JSON (JavaScript Object Notation) suggests " (double quotes) to surround keys and strings. And this is a reason for json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1). I have also suggested Unsafe way, just ignore that if you wish. You will need to find a way in case if your data string contains u as part of original data. Commented Jan 6, 2019 at 18:40

1 Answer 1

2

The unicode string is the result of unicode being called on a dictionary.

>>> d = {u'name': u'myname'}
>>> u = unicode(d) 
>>> u  
u"{u'name': u'myname'}" 

If you control the code that's doing this, the best fix is to change it to call json.dumps instead.

>>> json.dumps(d)
'{"name": "myname"}'

If you don't control the creation of this object, you'll need to use ast.literal_eval to create the dictionary, as the unicode string is not valid json.

>>> json.loads(u)
Traceback (most recent call last):
...
ValueError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)


>>> ast.literal_eval(u)
{u'name': u'myname'}

The docs confirm that ast.literal_eval is safe:

can be used for safely evaluating strings containing Python values from untrusted sources

You could use eval instead, but as you don't control the creation of the object you cannot be certain that it has not been crafted by a malicious user, to cause damage to your system.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.