4

I am trying to create a complex object based on metadata I have. It is an array of attributes which I am iterating and trying to create a dict. For example below is the array:

[
    "itemUniqueId",
    "itemDescription",
    "manufacturerInfo[0].manufacturer.value",
    "manufacturerInfo[0].manufacturerPartNumber",
    "attributes.noun.value",
    "attributes.modifier.value",
    "attributes.entityAttributes[0].attributeName",
    "attributes.entityAttributes[0].attributeValue",
    "attributes.entityAttributes[0].attributeUOM",
    "attributes.entityAttributes[1].attributeName",
    "attributes.entityAttributes[1].attributeValue",
    "attributes.entityAttributes[1].attributeUOM",
]

This array should give an output as below:

{
    "itemUniqueId": "",
    "itemDescription": "",
    "manufacturerInfo": [
        {
            "manufacturer": {
                "value": ""
            },
            "manufacturerPartNumber": ""
        }
    ],
    "attributes": {
        "noun": {
            "value": ""
        },
        "modifier": {
            "value": ""
        },
        "entityAttributes": [
             {
                 "attributeName": "",
                 "attributeValue": "",
                 "attributeUOM": ""
             },
             {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
             }
        ]
    }
}

I have written this logic but unable to get the desired output. It should work on both object and array given the metadata.

source_json = [
    "itemUniqueId",
    "itemDescription",
    "manufacturerInfo[0].manufacturer.value",
    "manufacturerInfo[0].manufacturerPartNumber",
    "attributes.noun.value",
    "attributes.modifier.value",
    "attributes.entityAttributes[0].attributeName",
    "attributes.entityAttributes[0].attributeValue",
    "attributes.entityAttributes[0].attributeUOM",
    "attributes.entityAttributes[1].attributeName",
    "attributes.entityAttributes[1].attributeValue",
    "attributes.entityAttributes[1].attributeUOM",
]

for row in source_json:
    propertyNames = row.split('.')
    temp = ''
    parent = {}
    parentArr = []
    parentObj = {}
    # if len(propertyNames) > 1:
    arrLength = len(propertyNames)
    for i, (current) in enumerate(zip(propertyNames)):
        if i == 0:
            if '[' in current:
                parent[current]=parentArr
            else:
                parent[current] = parentObj
            temp = current
        if i > 0 and i < arrLength - 1:
            if '[' in current:
                parent[current] = parentArr
            else:
                parent[current] = parentObj
            temp = current
        if i == arrLength - 1:
            if '[' in current:
                parent[current] = parentArr
            else:
                parent[current] = parentObj
            temp = current
            # temp[prev][current] = ""
    # finalMapping[target] = target
print(parent)
0

5 Answers 5

2

There's a similar question at Convert Dot notation string into nested Python object with Dictionaries and arrays where the accepted answer works for this question, but has unused code paths (e.g. isInArray) and caters to unconventional conversions expected by that question:

  • "arrOne[0]": "1,2,3""arrOne": ["1", "2", "3"] instead of
  • "arrOne[0]": "1,2,3""arrOne": ["1,2,3"] or
  • "arrOne[0]": "1", "arrOne[1]": "2", "arrOne[2]": "3""arrOne": ["1", "2", "3"]

Here's a refined implementation of the branch function:

def branch(tree, path, value):
    key = path[0]
    array_index_match = re.search(r'\[([0-9]+)\]', key)

    if array_index_match:
        # Get the array index, and remove the match from the key
        array_index = int(array_index_match[0].replace('[', '').replace(']', ''))
        key = key.replace(array_index_match[0], '')

        # Prepare the array at the key
        if key not in tree:
            tree[key] = []

        # Prepare the object at the array index
        if array_index == len(tree[key]):
            tree[key].append({})

        # Replace the object at the array index
        tree[key][array_index] = value if len(path) == 1 else branch(tree[key][array_index], path[1:], value)

    else:
        # Prepare the object at the key
        if key not in tree:
            tree[key] = {}

        # Replace the object at the key
        tree[key] = value if len(path) == 1 else branch(tree[key], path[1:], value)

    return tree

Usage:

VALUE = ''

def create_dict(attributes):
    d = {}
    for path_str in attributes:
        branch(d, path_str.split('.'), VALUE)
    return d
source_json = [
    "itemUniqueId",
    "itemDescription",
    "manufacturerInfo[0].manufacturer.value",
    "manufacturerInfo[0].manufacturerPartNumber",
    "attributes.noun.value",
    "attributes.modifier.value",
    "attributes.entityAttributes[0].attributeName",
    "attributes.entityAttributes[0].attributeValue",
    "attributes.entityAttributes[0].attributeUOM",
    "attributes.entityAttributes[1].attributeName",
    "attributes.entityAttributes[1].attributeValue",
    "attributes.entityAttributes[1].attributeUOM",
]

assert create_dict(source_json) == {
    "itemUniqueId": "",
    "itemDescription": "",
    "manufacturerInfo": [
        {
            "manufacturer": {
                "value": ""
            },
            "manufacturerPartNumber": ""
        }
    ],
    "attributes": {
        "noun": {
            "value": ""
        },
        "modifier": {
            "value": ""
        },
        "entityAttributes": [
             {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
            },
           {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
            }
        ]
    }
}
Sign up to request clarification or add additional context in comments.

Comments

2

First we should iterate over whole list and store each 3rd attributes, after that we could change this struct to our desired output:

from typing import Dict, List


source_json = [
    "attributes.entityAttributes[0].attributeName",
    "attributes.entityAttributes[0].attributeValue",
    "attributes.entityAttributes[0].attributeUOM",
    "attributes.entityAttributes[1].attributeName",
    "attributes.entityAttributes[1].attributeValue",
    "attributes.entityAttributes[1].attributeUOM",
    "attributes.entityAttributes[2].attributeName"
]


def accumulate(source: List) -> Dict:
    accumulator = {}
    for v in source:
        vs = v.split(".")
        root_attribute = vs[0]
        if not root_attribute in accumulator:
            accumulator[root_attribute] = {}

        i = vs[1].rfind('[')
        k = (vs[1][:i], vs[1][i+1:-1])

        if not k in accumulator[root_attribute]:
            accumulator[root_attribute][k] = {}
        accumulator[root_attribute][k][vs[2]] = ""
    return accumulator


def get_result(accumulated: Dict) -> Dict:
    result = {}
    for k, v in accumulated.items():
        result[k] = {}
        for (entity, idx), v1 in v.items():
            if not entity in result[k]:
                result[k][entity] = []
            if len(v1) == 3:
                result[k][entity].append(v1)
    return result


print(get_result(accumulate(source_json)))

The output will be:


{
    'attributes':
    {
        'entityAttributes':
        [
            {
                'attributeName': '',
                'attributeValue': '',
                'attributeUOM': ''
            },
            {'attributeName': '',
             'attributeValue': '',
             'attributeUOM': ''
            }
        ]
    }
}

In accumulate function we store 3rd level attributes in Dict with (entityAttributes, 0) ... (entityAttributes, 2) keys. In get_result function we convert Dict with (entityAttributes, 0) ... (entityAttributes, 2) keys to Dict from string to List.

1 Comment

It is working for List but how can we make it work for non-List properties.
2

How about something like this:

import re
import json

source_json = [
"attributes.entityAttributes[0].attributeName",
"attributes.entityAttributes[0].attributeValue",
"attributes.entityAttributes[0].attributeUOM",
"attributes.entityAttributes[1].attributeName",
"attributes.entityAttributes[1].attributeValue",
"attributes.entityAttributes[1].attributeUOM",
"attributes.entityAttributes[2].attributeName"
]


def to_object(source_json):

    def add_attribute(target, attribute_list):
        head, tail = attribute_list[0], attribute_list[1:]
        if tail:
            add_attribute(target.setdefault(head,{}), tail)
        else:
            target[head] = ''
    
    target = {}
    for row in source_json:
        add_attribute(target, re.split(r'[\.\[\]]+',row))
    return target
    
  
print(json.dumps(to_object(source_json), indent=4))

Note that this will not exactly do what you requested. It interprets stores the array also as an object with keys '0' ... '2'. This makes it easier to implement and also more stable. What would you expect, when the input list missed the entries with entityAttributes[0]. Should the list include an empty element or something different. Anyway you save space by not including this element, which works only if you store the array in an object.

Comments

1

None of the answers provided so far strike me as very intuitive. Here's one way to tackle the problem with three easy-to-understand functions.

Normalize inputs. First we need a function to normalize the inputs strings. Instead of rules-bearing strings like 'foo[0].bar' – where one must understand that integers in square brackets imply a list – we want a simple tuple of keys like ('foo', 0, 'bar').

def attribute_to_keys(a):
    return tuple(
        int(k) if k.isdigit() else k
        for k in a.replace('[', '.').replace(']', '').split('.')
    )

Build a uniform data structure. Second, we need a function to assemble a data structure consisting of dicts of dicts of dicts ... all the way down.

def assemble_data(attributes):
    data = {}
    for a in attributes:
        d = data
        for k in attribute_to_keys(a):
            d = d.setdefault(k, {})
    return convert(data)

def convert(d):
    # Just a placeholder for now.
    return d

Convert the uniform data. Third, we need to implement a real version of the placeholder. Specifically, we need it to recursively convert the uniform data structure into our ultimate goal having (a) empty strings at leaf nodes, and (b) lists rather than dicts whenever the dict keys are all integers. Note that this even fills in empty list positions with an empty string (a contingency not covered in your problem description; adjust as needed if you want a different behavior).

def convert(d):
    if not d:
        return ''
    elif all(isinstance(k, int) for k in d):
        return [convert(d.get(i)) for i in range(max(d) + 1)]
    else:
        return {k : convert(v) for k, v in d.items()}

Comments

0

You can use a custom builder class which implements __getattr__ and __getitem__ to gradually build the underlying object. This building can then be triggered by using eval on each of the attribute strings (note: eval is not safe for input from untrusted sources).

The following is an example implementation:

class Builder:
    def __init__(self):
        self.obj = None

    def __getattr__(self, key):
        if self.obj is None:
            self.obj = {}
        return self.obj.setdefault(key, Builder())

    def __getitem__(self, index):
        if self.obj is None:
            self.obj = []
        self.obj.extend(Builder() for _ in range(index+1-len(self.obj)))
        return self.obj[index]

    def convert(self):
        if self.obj is None:
            return ''
        elif isinstance(self.obj, list):
            return [v.convert() for v in self.obj]
        elif isinstance(self.obj, dict):
            return {k: v.convert() for k,v in self.obj.items()}
        else:
            assert False


attributes = [
    'itemUniqueId',
    'itemDescription',
    'manufacturerInfo[0].manufacturer.value',
    'manufacturerInfo[0].manufacturerPartNumber',
    'attributes.noun.value',
    'attributes.modifier.value',
    'attributes.entityAttributes[0].attributeName',
    'attributes.entityAttributes[0].attributeValue',
    'attributes.entityAttributes[0].attributeUOM',
    'attributes.entityAttributes[1].attributeName',
    'attributes.entityAttributes[1].attributeValue',
    'attributes.entityAttributes[1].attributeUOM',
]

builder = Builder()
for attr in attributes:
    eval(f'builder.{attr}')
result = builder.convert()

import json
print(json.dumps(result, indent=4))

which gives the following output:

{
    "itemUniqueId": "",
    "itemDescription": "",
    "manufacturerInfo": [
        {
            "manufacturer": {
                "value": ""
            },
            "manufacturerPartNumber": ""
        }
    ],
    "attributes": {
        "noun": {
            "value": ""
        },
        "modifier": {
            "value": ""
        },
        "entityAttributes": [
            {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
            },
            {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
            }
        ]
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.