2

I have some data that is inserted into a nested dictionary. The data are created and could theoretically be endlessly deep. It could e.g. looks like this:

data = {'leaves': {'dark': {}, 'green': {'light': {}}, 'without': {'veins': {'blue': {}}}, '5': {}}}

For some clarification: In this small sample, it means that a certain plant has 'leaves', the 'leaves' are 'dark', 'green' and 'without'. The 'green' is 'light' in this example etc.

I want to unnest this dictionary and store every key, value combination into a tuple. That could for example look like this:

[('leaves', 'dark'), ('leaves', 'green'), ('green', 'light'), ('without', 'veins'), ('leaves', '5'), ('veines', 'blue')]

Note: order is not important. For those interested, these tuples are further manipulated and will end up in a knowledge graph.

I thought a recursive function would do the trick here, but my function works best without the restatement, a function without a return statement is just a simple loop. However, I cannot make it work with a simple loop.

edit: the doubles variable is a global list.

The function I wrote:

def undict(d):
    for key in d.keys():
        if isinstance(d[key], dict):
            doubles += [(key, k) for k in d[key].keys()]
        undict(d[key]) # Normally: return undict(d[key])

Maybe can anyone offer some insights on how to make it truly recursive or use a simple loop? I am lost at this point.

5
  • You're building the list doubles, but not actually doing anything with it. Commented Feb 10, 2022 at 14:52
  • Yes apologies, I edited the question a bit. Doubles is a global list. Commented Feb 10, 2022 at 14:53
  • Are you missing ('leaves', 'without') in your sample output, or am I misunderstnding something? Commented Feb 10, 2022 at 15:01
  • Are you sure you want ('leaves', 'green') and ('green', 'light') as two separate elements, rather than ('leaves', 'green', 'light') as one element? Commented Feb 10, 2022 at 15:08
  • Yes separate elements are preferred, in the connections are maintained with a knowledge graph because of the subject and object are the same. Commented Feb 10, 2022 at 15:22

1 Answer 1

2

Your approach is pretty good!

However, note that you're using a global variable, doubles, rather than a local variable and a return statement, which would be cleaner.

To avoid issues with .append or .extend or += with lists, a very pythonic approach is to use a generator function, using keyword yield instead of keyword return.

data = {'leaves': {'dark': {}, 'green': {'light': {}}, 'without': {'veins': {'blue': {}}}, '5': {}}}

def undict_to_pairs(d):
    for k,v in d.items():
        if isinstance(v, dict):  # always true with your example data
            for subk in v:
                yield (k, subk)
            yield from undict_to_pairs(v)
        else:
            yield (k,v)          # this statement is never reached with your example data

print(list(undict_to_pairs(data)))
# [('leaves', 'dark'), ('leaves', 'green'), ('leaves', 'without'), ('leaves', '5'), ('green', 'light'), ('without', 'veins'), ('veins', 'blue')]

Note that with your example data, isinstance(v,dict) is always true. The else branch is never reached. So this shorter version would work too:

def undict_to_pairs(d):
    for k,v in d.items():
        for subk in v:
            yield (k, subk)
        yield from undict_to_pairs(v)

print(list(undict_to_pairs(data)))
# [('leaves', 'dark'), ('leaves', 'green'), ('leaves', 'without'), ('leaves', '5'), ('green', 'light'), ('without', 'veins'), ('veins', 'blue')]

Let me also suggest a different version, which is not what you asked for but looks more logical to me in regards to your data: generating long tuples instead of pairs. I removed isinstance(v, dict) from that version, since it appears the values in your data are always dicts.

def undict_to_tuples(d, acc = ()):
    if d == {}:
        yield acc
    else:
        for k,v in d.items():
            yield from undict_to_tuples(v, acc + (k,))

print(list(undict_to_tuples(data)))
# [('leaves', 'dark'), ('leaves', 'green', 'light'), ('leaves', 'without', 'veins', 'blue'), ('leaves', '5')]
Sign up to request clarification or add additional context in comments.

4 Comments

This is a very complete answer! Thank you! I was not familiar with .extend, but looking into the documentation it seems very useful. Yielding from and iterable is indeed much cleaner in this case.
@Robert Ehrm. Sorry. My first comment about += and .extend was wrong. I removed it.
Yes I saw. Thanks btw for the Else statement. While it is now always True, in future work this might come in handy!
I want to thank you for suggesting a different version. I am not quite done yet, but I think in the end I'll use the version where you accumulate the data, as you suggested. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.