0

I like to import regular expressions using binary strings like

    ReplaceChars = (
    ('UTF8 HYPHEN '   , b'\xe2\x80\x90', b'\x2d'),
    ('UTF8 EN DASH'   , b'\xe2\x80\x93', b'\x2d'),
    ('UTF8 EM DASH'   , b'\xe2\x80\x94', b'\x2d'),
  )

Currently this is hardcoded in my python source, but I like to to have it in a different file to be more flexible.

I normally use json.load(jsonfile) to do this, but it seems that json is not working with the binary strings...

I tried to dump ReplaceChars into a json file:

with open('result.json', 'w') as f:
    json.dump(ReplaceChars, f)

but it causes the following error:

    Traceback (most recent call last):
[...]
    TypeError: Object of type 'bytes' is not JSON serializable

Is there a workaround in json?

3 Answers 3

1

You can use e.g. Base64 encoding to represent your binary data as strings. See here: https://docs.python.org/3/library/base64.html

Sign up to request clarification or add additional context in comments.

Comments

0

It might be better to store the characters and encode to bytes in your application as needed (rather than hand-hard-coding the UTF-8 representation)

for example, considering the UTF8 HYPHEN:

>>> '\u2010'
'‐'
>>> '\u2010'.encode()
b'\xe2\x80\x90'
>>> '-'.encode()
b'-'
>>> {'UTF8 HYPHEN': ['\u2010', '-']}
{'UTF8 HYPHEN': ['‐', '-']}
>>> print(json.dumps({'UTF8 HYPHEN': ['\u2010', '-']}))
{"UTF8 HYPHEN": ["\u2010", "-"]}
>>> json.loads('{"UTF8 HYPHEN": ["\\u2010", "-"]}')['UTF8 HYPHEN'][0].encode()
b'\xe2\x80\x90'

2 Comments

I have also CP1252 EN DASH as 'b/x96'... how to handle them?
.encode('cp1252') will take a unicode string and give you the bytes you need
0

Instead of using a JSON file, you can use the pickle file to store binary strings data.

import pickle
from ast import literal_eval as make_tuple
ReplaceChars = (
    ('UTF8 HYPHEN '   , b'\xe2\x80\x90', b'\x2d'),
    ('UTF8 EN DASH'   , b'\xe2\x80\x93', b'\x2d'),
    ('UTF8 EM DASH'   , b'\xe2\x80\x94', b'\x2d'),
  )

with open('result.pkl', 'wb') as file:
    pickle.dump(ReplaceChars, file)

with open('result.pkl', 'rb') as file:
    Result_ReplaceChars = pickle.load(file)

print(Result_ReplaceChars)

1 Comment

This results in a binary file... since I need to have some human readable file for better handling this is not running for me

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.