168

I want to parse a bytes string in JSON format to convert it into python objects. This is the source I have:

my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'

And this is the desired outcome I want to have:

[{
"Date": "2016-05-21T21:35:40Z",
"CreationDate": "2012-05-05",
"LogoType": "png",
"Ref": 164611595,
"Classes": [
  "Email addresses",
  "Passwords"
],
"Link": "http://some_link.com"}]

First, I converted the bytes to string:

my_new_string_value = my_bytes_value.decode("utf-8")

but when I try to invoke loads to parse it as JSON:

my_json = json.loads(my_new_string_value)

I get this error:

json.decoder.JSONDecodeError: Expecting value: line 1 column 174 (char 173)
6
  • 4
    First things first. Bytes to string, then string to JSON Commented Oct 15, 2016 at 13:39
  • I've converted the bytes to string by using .decode("utf-8") but when I try to convert the string to JOSN i get this error json.decoder.JSONDecodeError: Expecting value:line 1 column 174 (char 173) Commented Oct 15, 2016 at 13:49
  • Can you update your question with the relevant code and print out the decoded string? Commented Oct 15, 2016 at 13:50
  • 1
    And, where do you get this json from? Commented Oct 15, 2016 at 13:52
  • 7
    @MerouaneBenthameur The reason it fails is because the string you have is not JSON. The most obvious thing is that JSON uses ", not '. Commented Oct 15, 2016 at 13:58

9 Answers 9

216

Your bytes object is almost JSON, but it's using single quotes instead of double quotes, and it needs to be a string. So one way to fix it is to decode the bytes to str and replace the quotes. Another option is to use ast.literal_eval; see below for details. If you want to print the result or save it to a file as valid JSON you can load the JSON to a Python list and then dump it out. Eg,

import json

my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'

# Decode UTF-8 bytes to Unicode, and convert single quotes 
# to double quotes to make it valid JSON
my_json = my_bytes_value.decode('utf8').replace("'", '"')
print(my_json)
print('- ' * 20)

# Load the JSON to a Python list & dump it back out as formatted JSON
data = json.loads(my_json)
s = json.dumps(data, indent=4, sort_keys=True)
print(s)

output

[{"Date": "2016-05-21T21:35:40Z", "CreationDate": "2012-05-05", "LogoType": "png", "Ref": 164611595, "Classe": ["Email addresses", "Passwords"],"Link":"http://some_link.com"}]
- - - - - - - - - - - - - - - - - - - - 
[
    {
        "Classe": [
            "Email addresses",
            "Passwords"
        ],
        "CreationDate": "2012-05-05",
        "Date": "2016-05-21T21:35:40Z",
        "Link": "http://some_link.com",
        "LogoType": "png",
        "Ref": 164611595
    }
]

As Antti Haapala mentions in the comments, we can use ast.literal_eval to convert my_bytes_value to a Python list, once we've decoded it to a string.

from ast import literal_eval
import json

my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'

data = literal_eval(my_bytes_value.decode('utf8'))
print(data)
print('- ' * 20)

s = json.dumps(data, indent=4, sort_keys=True)
print(s)

Generally, this problem arises because someone has saved data by printing its Python repr instead of using the json module to create proper JSON data. If it's possible, it's better to fix that problem so that proper JSON data is created in the first place.

Sign up to request clarification or add additional context in comments.

4 Comments

I don't believe it is a JSON string, rather a Python repr, so use literal_eval instead
BTW, if you want to analyze or traverse a complicated JSON structure please see stackoverflow.com/a/52414034/4014959 & stackoverflow.com/a/41778581/4014959
Re: Generally this problem arises ... proper JSON data is created: Bytes data is apparently not JSON serializable: json.dumps(b'\0x41\x45') -> TypeError: Object of type bytes is not JSON serializable
@Vercingatorix JSON is for serializing data that's ultimately composed of strings, numbers, and booleans (or null), it's not designed to cope with arbitrary binary data. But it's easy enough to transform such data into a JSON-friendly form, eg you can create a simple hex string with docs.python.org/3/library/stdtypes.html#bytes.hex Or you could optimize that slightly with docs.python.org/3/library/base64.html
119

You can simply use,

import json

my_bytes_value = my_bytes_value.decode().replace("'", '"')
json.loads(my_bytes_value)

4 Comments

this should have green tick.
Not really. This answers the question in the title but omits that the example provided by OP has single quotes. json.loads(my_bytes_value) would throw a json.decoder.JSONDecodeError in this case.
Yes, you are right @pierre-monico. I have updated the answer accordingly. Thanks for flagging.
For me, I had to remove .replace("'", '"')
10

Python 3.5 + Use io module

import json
import io

my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'

fix_bytes_value = my_bytes_value.replace(b"'", b'"')

my_json = json.load(io.BytesIO(fix_bytes_value))  

1 Comment

thank you! this is the the least icky and most direct path, particularly if the json within the bytesarray is already properly formatted.
3
d = json.dumps(byte_str.decode('utf-8'))

Comments

2

To convert this bytesarray directly to json, you could first convert the bytesarray to a string with decode(), utf-8 is standard. Change the quotation markers.. The last step is to remove the " from the dumped string, to change the json object from string to list.

dumps(s.decode()).replace("'", '"')[1:-1]

Comments

2

Better solution is:

import json
byte_array_example = b'{"text": "\u0627\u06CC\u0646 \u06CC\u06A9 \u0645\u062A\u0646 \u062A\u0633\u062A\u06CC \u0641\u0627\u0631\u0633\u06CC \u0627\u0633\u062A."}'    
res = json.loads(byte_array_example.decode('unicode_escape'))
print(res)

result:

{'text': 'این یک متن تستی فارسی است.'}

decode by utf-8 cannot decode unicode characters. The right solution is uicode_escape

It is OK

Comments

1

if you have a bytes object and want to store it in a JSON file, then you should first decode the byte object because JSON only has a few data types and raw byte data isn't one of them. It has arrays, decimal numbers, strings, and objects.

To decode a byte object you first have to know its encoding. For this, you can use

import chardet
encoding = chardet.detect(your_byte_object)['encoding']

then you can save this object to your json file like this

data = {"data": your_byte_object.decode(encoding)}
with open('request.txt', 'w') as file:
    json.dump(data, file)

Comments

1

The most simple solution is to use the json function that comes with http request.

For example:

An example of use the json function

1 Comment

I know this answer is 2 years old, but why would you ever share your code as an image instead of actual text? This is isn't copyable, takes longer to load, and ignores the styling preferences of everyone who reads this.
1

json.loads(body.decode('utf-8'))

This will decode and convert it to a Python dictionary.

The body in the above code is from a fastapi libraries Request object. This object's body attribute value will be in bytes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.