In python, create index from flat representation of nested structure in a list, sorting by alphabetical order

Question

I have lists where each entry is representing a nested structure, where / represents each level in the structure.

['a','a/b/a','a/b','a/b/d',....]

I want to take such a list and return an index list where each level is sorted in alphabetical order.

If we had the following list

['a','a/b','a/b/a','a/c','a/c/a','b']

It represents the nested structure

'a':                   #1

    'b':               #1.1
         'a': ...      #1.1.1
    'c':               #1.2
         'a': ...      #1.2.1
'b' : ...              #2

I am trying to get the output

 ['1','1.1','1.1.1', '1.2','1.2.1','2']

But I am having real issue on how to tackle the problem, would it be solved recursively? Or what would be a way to solve this for any generic list where each level is separated by /? The list is originally not necessarily sorted, and each level can be any generic word.

Build up a structure of nested dictionaries. For each string, split it on / characters, and then iterate on the result. Initialize a variable to point to the root of the structure. Look up the first result in the root dictionary. if you don't find a key with the first character, then add a new key with {} (an empty dictionary) as the value. Then change the pointer to point to the dictionary for that character. Then consider the next character. Repeat until you run out of characters. When you're done, you'll have a structure that represents the output that you show. — CryptoFool
– CryptoFool, Commented Oct 26, 2022 at 4:54
You say "It represents the nested structure" - in what form do you have the actual data? Is it in a dictionary, a .json file, ...? — Grismar
– Grismar, Commented Oct 26, 2022 at 4:55
@Grismar, the original structure would be a YAML file. But, the input into the function will simply be the list I mentioned. — Kspr
– Kspr, Commented Oct 26, 2022 at 4:56
"nested dictionaries" seems like a pretty clear definition to me. — CryptoFool
– CryptoFool, Commented Oct 26, 2022 at 4:56
...ah, ok. then "string" rather than character. It doesn't change the basic idea. At the end, to get the numeric result that you want, you'd want to sort each of the resulting dictionaries by their keys. Then you could walk the structure and build up the 1, 1.1, etc. — CryptoFool
– CryptoFool, Commented Oct 26, 2022 at 4:57

CryptoFool · Accepted Answer · 2022-10-26 05:55:17Z

2

Here's a similar solution to the accepted answer, but I think it might be more correct than that answer. If I understand the problem correctly, there should be exactly one value in the output list for each value in the input list. A input of ['a/b/c/d'] should result in ['1.1.1.1'], not in a list with four values.

Anyway, here's my solution, with a couple of extra test cases:

def doit(inp):

    def recursive_print(struct, sum=""):
        if sum and struct[1]:
            print(sum)
        for i, key in enumerate(sorted(struct[0].keys())):
            recursive_print(struct[0][key], sum + ("." if sum else "") + str(i + 1))

    struct = [{}, False]

    for v in inp:
        p = last = struct
        for part in v.split('/'):
            if part not in p[0]:
                p[0][part] = [{}, False]
            p = p[0][part]
        p[1] = True

    recursive_print(struct)

inp = ['a','a/b','a/b/a','a/c','a/c/a','b']
doit(inp)

print()

inp = ['a/b/c/d']
doit(inp)

print()

inp = ['joe','joe/sam','larry/curly/moe','jerry/jill','jerry/jill/tom','jerry/jill/tony','alice/jill/betty/davis/eyes','john']
doit(inp)

Result:

1
1.1
1.1.1
1.2
1.2.1
2

1.1.1.1

1.1.1.1.1
2.1
2.1.1
2.1.2
3
3.1
4
5.1.1

answered Oct 26, 2022 at 5:55

CryptoFool

23.4k5 gold badges31 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Kspr Over a year ago

inp = ["a","c","b","a/b"] should return [1,3,2,1.1] which seems to not be the case here.

CryptoFool Over a year ago

But you mention the result being sorted. I don't understand how that result is sorted. "c" coming before "b" is not sorted. It would be easy to get the unsorted result. Easier than to get the sorted one actually. All you'd do is walk back over the input list and look up the value in the structure, noting the position of each sub-term in terms of the list of items that appear at that level.

Kspr Over a year ago

I meant, that in the case that we have ["cat","cat/dog", "cat/abs/dog/car"] then, since abs comes before dog when sorting alphabetically, output should be [1, 1.2, 1.1.1.1].

blhsing · Accepted Answer · 2022-10-28 01:49:45Z

Since the goal is to simply convert the paths to indices according to their respective positions against other paths of the same prefix, there is no need to build a tree at all. Instead, iterate over the paths in alphabetical order while using a dict of sets to keep track of the prefixes at each level of paths, and join the lengths of sets at each level for output:

def indices(paths):
    output = {}
    names = {}
    for index, path in sorted(enumerate(paths), key=lambda t: t[1]):
        counts = []
        prefixes = tuple(path.split('/'))
        for level, name in enumerate(prefixes):
            prefix = prefixes[:level]
            names.setdefault(prefix, set()).add(name)
            counts.append(len(names[prefix]))
        output[index] = '.'.join(map(str, counts))
    return list(map(output.get, range(len(output))))

so that:

print(indices(['a', 'a/b', 'a/b/a', 'a/c', 'a/c/a', 'b']))
print(indices(['a', 'c', 'b', 'a/b']))
print(indices(['a/b/c/d', 'a/b/d', 'a/b/c']))
print(indices(['abc/d', 'bcc/d']))
print(indices(['apple/cat','apple/dog', 'banana/dog']))

outputs:

['1', '1.1', '1.1.1', '1.2', '1.2.1', '2']
['1', '3', '2', '1.1']
['1.1.1.1', '1.1.2', '1.1.1']
['1.1', '2.1']
['1.1', '1.2', '2.1']

Demo: https://replit.com/@blhsing/StainedMassivePi

Grismar · Accepted Answer · 2022-10-26 05:00:58Z

0

Since the parts of the string separated by / can presumably have different lengths, you can't just sort the strings directly. However, by splitting the strings over the /, you can get tuples, which you can sort directly in the way you want:

strings = ['a','a/b/a','a/b','a/b/d', 'b/a', 'b']
keys = sorted(map(lambda s: s.split('/'), strings))
print(keys)

Output:

[['a'], ['a', 'b'], ['a', 'b', 'a'], ['a', 'b', 'd'], ['b'], ['b', 'a']]

answered Oct 26, 2022 at 5:00

Grismar

32.4k6 gold badges43 silver badges69 bronze badges

5 Comments

Kspr Over a year ago

The objective is to get the representation such as ['1','1.1','1.1.1', '1.2','1.2.1','2'] as written in the question

WombatPM Over a year ago

So replace a with 1, b with 2 etc. care to provide more details on the conversion? Does case matter? What does an/x/d translate to

Kspr Over a year ago

"The list is originally not necessarily sorted, and each level can be any generic word." So I can't before know what character or word will be what. Is it still confusing?

Grismar Over a year ago

You didn't provide the input data and when asked about it, you said "would be a YAML file. But, the input into the function will simply be the list I mentioned" - this code results in that list and without a sample of the data you're looking to select from, it's unclear where that would be coming from? Or do you literally want to translate to a numerical equivalent (i.e. the first one is 1, the second 2, etc.)?

Grismar Over a year ago

The issue is with something like ['a', 'a/b', 'b', 'b/a', 'b/c'] - in that case, do you expect ['1', '1.1', '2', '2.2', '2.3'] or ['1', '1.1', '2', '2.1', '2.2'], or ...?

Collectives™ on Stack Overflow

In python, create index from flat representation of nested structure in a list, sorting by alphabetical order

3 Answers 3

3 Comments

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related