Convert Dot notation string into nested Python object with Dictionaries and arrays

Question

Background

For some background, I'm trying to create a tool that converts worksheets into API calls using Python 3.5

For the conversion of the table cells to the schema needed for the API call, I've started down the path of using javascript like syntax for the headers used in the spreadsheet. e.g:

Worksheet Header (string)

dict.list[0].id

Python Dictionary

{
  "dict":
    "list": [
      {"id": "my cell value"}
    ]
}

It's also possible that the header schema could have nested arrays/dicts:

one.two[0].three[0].four.five[0].six

And I also need to append to the object after it has been created as I go through each header.

What I've tried

add_branch

Based on https://stackoverflow.com/a/47276490/2903486 I am able to get nested dictionaries setup using values like one.two.three.four and I'm able to append to the existing dictionary as I go through the rows but I've been unable to add in support for arrays:

def add_branch(tree, vector, value):
    key = vector[0]
    tree[key] = value \
        if len(vector) == 1 \
        else add_branch(tree[key] if key in tree else {},
                        vector[1:],
                        value)
    return tree

file = Worksheet(filePath, sheet).readRow()
rowList = []
for row in file:
    rowObj = {}
    for colName, rowValue in row.items():
        rowObj.update(add_branch(rowObj, colName.split("."), rowValue))
    rowList.append(rowObj)
return rowList

My own version of add_branch

import re, json
def branch(tree, vector, value):
    """
    Used to convert JS style notation (e.g dict.another.array[0].id) to a python object
    Originally based on https://stackoverflow.com/a/47276490/2903486
    """

    # Convert Boolean
    if isinstance(value, str):
        value = value.strip()

        if value.lower() in ['true', 'false']:
            value = True if value.lower() == "true" else False

    # Convert JSON
    try:
        value = json.loads(value)
    except:
        pass

    key = vector[0]
    arr = re.search('\[([0-9]+)\]', key)
    if arr:
        arr = arr.group(0)
        key = key.replace(arr, '')
        arr = arr.replace('[', '').replace(']', '')

        newArray = False
        if key not in tree:
            tree[key] = []
            tree[key].append(value \
                                 if len(vector) == 1 \
                                 else branch({} if key in tree else {},
                                             vector[1:],
                                             value))
        else:
            isInArray = False
            for x in tree[key]:
                if x.get(vector[1:][0], False):
                    isInArray = x[vector[1:][0]]

            if isInArray:
                tree[key].append(value \
                                     if len(vector) == 1 \
                                     else branch({} if key in tree else {},
                                                 vector[1:],
                                                 value))
            else:

                tree[key].append(value \
                                     if len(vector) == 1 \
                                     else branch({} if key in tree else {},
                                                 vector[1:],
                                                 value))

        if len(vector) == 1 and len(tree[key]) == 1:
            tree[key] = value.split(",")
    else:
        tree[key] = value \
            if len(vector) == 1 \
            else branch(tree[key] if key in tree else {},
                        vector[1:],
                        value)
    return tree

What still needs help

My branch solution works pretty well actually now after adding in some things but I'm wondering if I'm doing something wrong/messy here or if theres a better way to handle where I'm editing nested arrays (my attempt started in the if IsInArray section of the code)

I'd expect these two headers to edit the last array, but instead I end up creating a duplicate dictionary on the first array:

file = [{
    "one.array[0].dict.arrOne[0]": "1,2,3",
    "one.array[0].dict.arrTwo[0]": "4,5,6"
}]
rowList = []
for row in file:
    rowObj = {}
    for colName, rowValue in row.items():
        rowObj.update(add_branch(rowObj, colName.split("."), rowValue))
    rowList.append(rowObj)
return rowList

Outputs:

[
    {
        "one": {
            "array": [
                {
                    "dict": {
                        "arrOne": [
                            "1",
                            "2",
                            "3"
                        ]
                    }
                },
                {
                    "dict": {
                        "arrTwo": [
                            "4",
                            "5",
                            "6"
                        ]
                    }
                }
            ]
        }
    }
]

Instead of:

[
    {
        "one": {
            "array": [
                {
                    "dict": {
                        "arrOne": [
                            "1",
                            "2",
                            "3"
                        ],
                        "arrTwo": [
                            "4",
                            "5",
                            "6"
                        ]
                    }
                }
            ]
        }
    }
]

Maybe I missed a detail here, but how does your string indicate the list index position of any selector that follows? For example in "dict.list[].id": if list looks like [{'id': 1}, {'id': 2}] how do you know which id you are referring to? — benvc
– benvc, Commented Nov 12, 2018 at 21:23
@benvc yeah thats one thing i was trying to figure out myself (maybe by putting that in the header e.g. list[0].id or by checking if the indicator is an int like list.1.id -- was hoping it could be something like the last, but haven't figured that out entirely — Andrew Bowman
– Andrew Bowman, Commented Nov 12, 2018 at 21:32
edit: added a better version of the add_branch method that (almost) handles arrays properly — Andrew Bowman
– Andrew Bowman, Commented Nov 14, 2018 at 15:19

Andrew Pye · Accepted Answer · 2020-11-09 21:58:48Z

So I'm not sure if there are any caveats in this solution, but this appears to work for some of the use cases i'm throwing at it:

import json, re
def build_job():

    def branch(tree, vector, value):
        
        # Originally based on https://stackoverflow.com/a/47276490/2903486

        # Convert Boolean
        if isinstance(value, str):
            value = value.strip()

            if value.lower() in ['true', 'false']:
                value = True if value.lower() == "true" else False

        # Convert JSON
        try:
            value = json.loads(value)
        except:
            pass
        
        key = vector[0]
        arr = re.search('\[([0-9]+)\]', key)
            
        if arr:
            
            # Get the index of the array, and remove it from the key name
            arr = arr.group(0)
            key = key.replace(arr,'')
            arr = int(arr.replace('[','').replace(']',''))
            
            if key not in tree:
                
                # If we dont have an array already, turn the dict from the previous 
                # recursion into an array and append to it
                tree[key] = []
                tree[key].append(value \
                    if len(vector) == 1 \
                    else branch({} if key in tree else {},
                                vector[1:],
                                value))
            else:
                
                # Check to see if we are inside of an existing array here
                isInArray = False
                for i in range(len(tree[key])):
                    if tree[key][i].get(vector[1:][0], False):
                        isInArray = tree[key][i][vector[1:][0]]
                        
                if isInArray and arr < len(tree[key]) \
                   and isinstance(tree[key][arr], list):
                    # Respond accordingly by appending or updating the value
                    tree[key][arr].append(value \
                        if len(vector) == 1 \
                        else branch(tree[key] if key in tree else {},
                                    vector[1:],
                                    value))
                else:
                    # Make sure we have an index to attach the requested array to
                    while arr >= len(tree[key]):
                        tree[key].append({})

                    # update the existing array with a dict
                    tree[key][arr].update(value \
                        if len(vector) == 1 \
                        else branch(tree[key][arr] if key in tree else {},
                                    vector[1:],
                                    value))
            
            # Turn comma deliminated values to lists
            if len(vector) == 1 and len(tree[key]) == 1:
                tree[key] = value.split(",")
        else:
            # Add dictionaries together
            tree.update({key: value \
                if len(vector) == 1 \
                else branch(tree[key] if key in tree else {},
                            vector[1:],
                            value)})
        return tree

    file = [{
        "one.array[0].dict.dont-worry-about-me": "some value",
        "one.array[0].dict.arrOne[0]": "1,2,3",
        "one.array[0].dict.arrTwo[1]": "4,5,6",
        "one.array[1].x.y[0].z[0].id": "789"
    }]
    rowList = []
    for row in file:
        rowObj = {}
        for colName, rowValue in row.items():
            rowObj.update(branch(rowObj, colName.split("."), rowValue))
        rowList.append(rowObj)
    return rowList
print(json.dumps(build_job(), indent=4))

Result:

[
    {
        "one": {
            "array": [
                {
                    "dict": {
                        "dont-worry-about-me": "some value",
                        "arrOne": [
                            "1",
                            "2",
                            "3"
                        ],
                        "arrTwo": [
                            "4",
                            "5",
                            "6"
                        ]
                    }
                },
                {
                    "x": {
                        "y": [
                            {
                                "z": [
                                    {
                                        "id": 789
                                    }
                                ]
                            }
                        ]
                    }
                }
            ]
        }
    }
]

Collectives™ on Stack Overflow

Convert Dot notation string into nested Python object with Dictionaries and arrays

Background

What I've tried

add_branch

My own version of add_branch

What still needs help

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Background

What I've tried

add_branch

My own version of add_branch

What still needs help

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related