0

I have a string in the following format:

"2A:xxx\r\n3A:yyyy\r\n51:yzzzz\r\n52:yzyeys\r\n4A:....."

This needs to be converted into a dictionary by splitting at the \r\n.
However,the difficult part is that fact that for the pairs between 3A and 4A,the key needs to be pre-pended by 3A,to make it apparent that they are a sub-set of 3A.
So the final expected output is as follows:

{'2A':'xxxx','3A':'yyyy','3A-51':'yzzzz','3A-52':'yzyeys','4A':'.....}

Is there any easier way than to extract all the data into a dictionary and iterating through the dict later with a for loop. Can this be done in a single parse in-process?

4 Answers 4

1

str.splitlines() does most of the work for you:

>>> "2A:xxx\r\n3A:yyyy\r\n51:yzzzz\r\n52:yzyeys\r\n4A:.....".splitlines()
['2A:xxx', '3A:yyyy', '51:yzzzz', '52:yzyeys', '4A:.....']

The tricky bit here is tracking the 3A key; presumably it's the A in the key that defines the hierarchy.

It's best to split that out to a generator:

def hierarchy_key_values(lines):
    parent = ''
    for line in lines:
        key, value = line.split(':', 1)
        if key[-1] == 'A':
            parent = key + '-'
        else:
            key = parent + key

        yield key, value

The rest is easy:

your_dict = dict(hierarchy_key_values(input_text.splitlines()))

Demo with your example input:

>>> dict(hierarchy_key_values(input_text.splitlines()))
{'3A-52': 'yzyeys', '3A': 'yyyy', '3A-51': 'yzzzz', '2A': 'xxx', '4A': '.....'}
Sign up to request clarification or add additional context in comments.

Comments

1

Off the top of my head:

 dct = {}
 last = ''
 for line in s.splitlines():
    key, val = line.split(':')
    if key.isdigit():
        key = last + '-' + key
     else:
        last = key
     dct[key] = val

This works, but having "compound" keys is generally not the best way to work with hierarchical structures. I'd suggest something like this instead:

dct = {}
last = ''
for line in s.splitlines():
    key, val = line.split(':')
    if key.isdigit():
        dct[last].setdefault('items', {})[key] = {'value': val }
    else:
        dct[key] = {'value': val }
        last = key

This makes a dict like:

{'2A': {'value': 'xxx'},
 '3A': {'items': {'51': {'value': 'yzzzz'}, '52': {'value': 'yzyeys'}},
        'value': 'yyyy'},
 '4A': {'value': '.....'}}

Looks more complicated, but actually it would be much easier to work with.

Comments

0
def solve(strs):
    dic = {}
    prev = None
    for x in strs.splitlines():
        key,val = x.split(":")
        if "A" not in key:                #or key.isdigit()
            new_key = "-".join((prev,key))
            dic[new_key] = val
        else:
            dic[key] = val
            prev = key
    return dic
strs = "2A:xxx\r\n3A:yyyy\r\n51:yzzzz\r\n52:yzyeys\r\n4A:"
print solve(strs)    

output:

{'3A-52': 'yzyeys', '3A': 'yyyy', '3A-51': 'yzzzz', '2A': 'xxx', '4A': ''}

Comments

0

With the reduce function you can keep memory while iterating and then succeed with a one-liner:

>>> import re
>>> reduce(lambda col, x: x + [y if re.match(r'\d+A.*', y) else col[-1][0:2] + '-' + y], s.split('\r\n'), [])
['2A:xxx', '3A:yyyy', '3A-51:yzzzz', '3A-52:yzyeys', '4A:.....']

As Martin says, the split function splits the string into parts, and reduce gathers the collection being populated and the new element. So you can have a look at the last element added (x[-1]) to get its identifier.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.