0

I would like to deserialize JSON of which I predefined the schema. Here is a typical JSON file I deal with.

{'op': 'mcm',
 'id': 1,
 'clk': 'AKjT4QEAl5q/AQCW7rIB',
 'pt': 1563999965598,
 'mc': [{'id': '1.160679253',
   'rc': [{'atl': [[1.18, 88.5],
      [1.17, 152.86],
      [1.16, 175.96],
      [1.14, 93.3],
      [1.08, 28.08],
      [1.07, 8.84],
      [1.02, 129.74]],
     'id': 1}]}]}

for which I would like a schema like that:

{'op': String,
 'id': Integer,
 'clk': String,
 'pt': Integer,
 'mc': [{'id': String,
   'rc': [{'atl': Array(Decimal),
     'id': Integer}]}]}

I know it is possible to do that with PySpark but I am looking for a lighter solution (something on the top of the json packages for example).

Here is what I already tried so far:

  • deserializing the JSON file and having a custom function to set up the type of each of the elements of the dictionary: I am afraid that by converting from string to float and then from float to Decimal I would get rounding errors.
  • Use a custom JSONDecoder (https://docs.python.org/3/library/json.html#json.JSONDecoder) with custom parse_float, parse_int, parse_constant function: those functions only take the string to be parsed as an argument and I would have to treat '1.160679253' (just after pt) and '1.18' (just after atl) the same way while I want '1.160679253' to remain a string and '1.18' to be cast as decimal.

Thanks in advance for your help

3
  • Could you give an example of such a custom function from your first point? I'm wondering why do you feel the need to convert from String->Float->Decimal instead of just using the Decimal() function right away Commented Jul 29, 2019 at 9:17
  • I thought that when we call the load function in JSON, it will automatically convert elements that look like float to float and I would have to convert them to Decimal which would give this sequence of conversions: String -> Float -> Decimal. Commented Jul 29, 2019 at 9:30
  • Yes, I figured that after a moment of thinking, that's why I suggested forcing JSON parser to convert float-looking fields to Decimals straight away to avoid information loss. I thought you may not be aware that it's an option. Commented Jul 29, 2019 at 9:32

1 Answer 1

2

Your first approach is the most lightweight one as it requires nothing but the standard library - just use a custom function based on json package tailored to what you need. As for the float->decimal conversion and precision loss, json.loads() has parse_float parameter to force floating number parsing as Decimals straight away:

>>> import decimal
>>> json.loads('1.1', parse_float=decimal.Decimal)
Decimal('1.1')

As for the ID field, which will be parsed to Decimal as well thanks to its unique float-similar format - you can just convert it back to string via str() with no information loss as a special case.

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks, what you suggest is the second thing I tried and it would convert '1.160679253' in 'id': '1.160679253' to Decimal as well while I want to keep it as a String
Well, then just convert this single field back to string in the custom function - no information will be lost in this conversion and this field is clearly a special case, as usually strings meant to be strings don't look like decimals.
Isn't it possible that when I convert it back I will get rounding errors ?
In the end I am going to force json.loads() to keep everything as String and add a custom function that will convert all the strings in the dictionary one by one. That way I can cast the each element depending on its key.
What rounding errors? Decimal->String as performed by str(decimal) doesn't do any rounding at all. >>> d = decimal.Decimal('0.1428571428571428571428571429') >>> str(d) '0.1428571428571428571428571429'
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.