1

I have a tough task, it is to download a json file from a format and re encode in other format to upload in a MongoDB. My json file is from Alpha Vantage (https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=MSFT&interval=1min&apikey=demo) and has the following format.

"Time Series (1min)": {
    "2018-07-13 16:00:00": {
        "1. open": "105.4550",
        "2. high": "105.5600",
        "3. low": "105.3900",
        "4. close": "105.4300",
        "5. volume": "2484606"
    },
    "2018-07-13 15:59:00": {
        "1. open": "105.5300",
        "2. high": "105.5300",
        "3. low": "105.4500",
        "4. close": "105.4600",
        "5. volume": "216617"
    }

I need to re encode the file according the following schema using Day, hour and minute as keys.

{
'2018-07-13': {
    '16': {
        '00': {'open': 105.4550,
              'high': 105.5600,
              'low': 105.3900,
              'close': 105.4300,
              'volume': 2484606,}
        }
    }
'2018-07-13': {
    '15': {
        '59': {'open': 105.53000,
              'high': 105.5300,
              'low': 105.4500,
              'close': 105.4600,
              'volume': 6484606,}
        }
    }
}

I've done a lot of research but I didn't figure it out how to construct a Dictionary with multiple keys using a loop, at the same time I read the json file I'd like to re enconde in the Dict.

1 Answer 1

1

I agree it can be a little confusing if you aren't used to working with nested data structures, but it's not that hard if you're careful. The trick is to create the inner dictionaries if they don't already exist. We can do that with the dict.setdefault method.

We also need to convert the inner data from strings to numbers. But we want the numbers to be integers if they don't contain a decimal point, otherwise we want floats. The usual way to do that is shown in my str_to_num function. First we try to convert to integer, and if that fails we convert to float. And if that fails due to bad data, the program will raise a ValueError exception and terminate. You may want to handle that differently, eg to ignore bad data.

I'll assume that you know how to extract the desired data from the outermost level using the "Time Series (1min)" key. The code below uses the standard json module simply to convert the data in the new format back to JSON so we can print it nicely.

import json

alpha_data = {
    "2018-07-13 16:00:00": {
        "1. open": "105.4550",
        "2. high": "105.5600",
        "3. low": "105.3900",
        "4. close": "105.4300",
        "5. volume": "2484606"
    },
    "2018-07-13 15:59:00": {
        "1. open": "105.5300",
        "2. high": "105.5300",
        "3. low": "105.4500",
        "4. close": "105.4600",
        "5. volume": "216617"
    }
}

def str_to_num(s):
    try:
        n = int(s)
    except ValueError:
        n = float(s)
    return n

# Where we'll store the output
out_data = {}

for timestamp, data in alpha_data.items():
    datestr, timestr = timestamp.split()
    hr, mn, _ = timestr.split(':')
    # Fetch inner dicts, creating them if they don't exist yet
    d = out_data.setdefault(datestr, {})
    d = d.setdefault(hr, {})
    d[mn] = {k.split()[1]: str_to_num(v) for k, v in data.items()}

print(json.dumps(out_data, indent=4))  

output

{
    "2018-07-13": {
        "16": {
            "00": {
                "open": 105.455,
                "high": 105.56,
                "low": 105.39,
                "close": 105.43,
                "volume": 2484606
            }
        },
        "15": {
            "59": {
                "open": 105.53,
                "high": 105.53,
                "low": 105.45,
                "close": 105.46,
                "volume": 216617
            }
        }
    }
}

You will notice that my output isn't exactly the same as your desired output. That's because keys in Python dictionaries are unique: you can't have two items in the same dict with a key of "2018-07-13". So my code creates a dict in out_data with the key of "2018-07-13" and inside that dict it creates a dict for each hour, as necessary.

Sign up to request clarification or add additional context in comments.

3 Comments

amazing, I'm having headache for a week with this problem and you solved in a few lines, thanks for the explanation also, as you said I think was missing some knowledge with nested data structure and my output was incorrect there was two equal key in the dict, your output is exacly what i need. Just one last question, it is possible to create one variable per day? I'd like to add one document per day on the MongoDB.
@FernandoSilva Sorry, I don't know MongoDB, so I can't give you any specific advice about it. My code will process any amount of data that matches the format in your question. It will create a separate dict for each day that it sees in the input data.
I was analyzing your code and the best soliciton will be and filtering each day and uploading once at the time, thank you once again you really helped me a lot.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.