5

I have lists where each entry is representing a nested structure, where / represents each level in the structure.

['a','a/b/a','a/b','a/b/d',....]

I want to take such a list and return an index list where each level is sorted in alphabetical order.

If we had the following list

['a','a/b','a/b/a','a/c','a/c/a','b']

It represents the nested structure

'a':                   #1

    'b':               #1.1
         'a': ...      #1.1.1
    'c':               #1.2
         'a': ...      #1.2.1
'b' : ...              #2

I am trying to get the output

 ['1','1.1','1.1.1', '1.2','1.2.1','2']

But I am having real issue on how to tackle the problem, would it be solved recursively? Or what would be a way to solve this for any generic list where each level is separated by /? The list is originally not necessarily sorted, and each level can be any generic word.

8
  • 2
    Build up a structure of nested dictionaries. For each string, split it on / characters, and then iterate on the result. Initialize a variable to point to the root of the structure. Look up the first result in the root dictionary. if you don't find a key with the first character, then add a new key with {} (an empty dictionary) as the value. Then change the pointer to point to the dictionary for that character. Then consider the next character. Repeat until you run out of characters. When you're done, you'll have a structure that represents the output that you show. Commented Oct 26, 2022 at 4:54
  • You say "It represents the nested structure" - in what form do you have the actual data? Is it in a dictionary, a .json file, ...? Commented Oct 26, 2022 at 4:55
  • @Grismar, the original structure would be a YAML file. But, the input into the function will simply be the list I mentioned. Commented Oct 26, 2022 at 4:56
  • "nested dictionaries" seems like a pretty clear definition to me. Commented Oct 26, 2022 at 4:56
  • 1
    ...ah, ok. then "string" rather than character. It doesn't change the basic idea. At the end, to get the numeric result that you want, you'd want to sort each of the resulting dictionaries by their keys. Then you could walk the structure and build up the 1, 1.1, etc. Commented Oct 26, 2022 at 4:57

3 Answers 3

2

Here's a similar solution to the accepted answer, but I think it might be more correct than that answer. If I understand the problem correctly, there should be exactly one value in the output list for each value in the input list. A input of ['a/b/c/d'] should result in ['1.1.1.1'], not in a list with four values.

Anyway, here's my solution, with a couple of extra test cases:

def doit(inp):

    def recursive_print(struct, sum=""):
        if sum and struct[1]:
            print(sum)
        for i, key in enumerate(sorted(struct[0].keys())):
            recursive_print(struct[0][key], sum + ("." if sum else "") + str(i + 1))

    struct = [{}, False]

    for v in inp:
        p = last = struct
        for part in v.split('/'):
            if part not in p[0]:
                p[0][part] = [{}, False]
            p = p[0][part]
        p[1] = True

    recursive_print(struct)

inp = ['a','a/b','a/b/a','a/c','a/c/a','b']
doit(inp)

print()

inp = ['a/b/c/d']
doit(inp)

print()

inp = ['joe','joe/sam','larry/curly/moe','jerry/jill','jerry/jill/tom','jerry/jill/tony','alice/jill/betty/davis/eyes','john']
doit(inp)

Result:

1
1.1
1.1.1
1.2
1.2.1
2

1.1.1.1

1.1.1.1.1
2.1
2.1.1
2.1.2
3
3.1
4
5.1.1
Sign up to request clarification or add additional context in comments.

3 Comments

inp = ["a","c","b","a/b"] should return [1,3,2,1.1] which seems to not be the case here.
But you mention the result being sorted. I don't understand how that result is sorted. "c" coming before "b" is not sorted. It would be easy to get the unsorted result. Easier than to get the sorted one actually. All you'd do is walk back over the input list and look up the value in the structure, noting the position of each sub-term in terms of the list of items that appear at that level.
I meant, that in the case that we have ["cat","cat/dog", "cat/abs/dog/car"] then, since abs comes before dog when sorting alphabetically, output should be [1, 1.2, 1.1.1.1].
1

Since the goal is to simply convert the paths to indices according to their respective positions against other paths of the same prefix, there is no need to build a tree at all. Instead, iterate over the paths in alphabetical order while using a dict of sets to keep track of the prefixes at each level of paths, and join the lengths of sets at each level for output:

def indices(paths):
    output = {}
    names = {}
    for index, path in sorted(enumerate(paths), key=lambda t: t[1]):
        counts = []
        prefixes = tuple(path.split('/'))
        for level, name in enumerate(prefixes):
            prefix = prefixes[:level]
            names.setdefault(prefix, set()).add(name)
            counts.append(len(names[prefix]))
        output[index] = '.'.join(map(str, counts))
    return list(map(output.get, range(len(output))))

so that:

print(indices(['a', 'a/b', 'a/b/a', 'a/c', 'a/c/a', 'b']))
print(indices(['a', 'c', 'b', 'a/b']))
print(indices(['a/b/c/d', 'a/b/d', 'a/b/c']))
print(indices(['abc/d', 'bcc/d']))
print(indices(['apple/cat','apple/dog', 'banana/dog']))

outputs:

['1', '1.1', '1.1.1', '1.2', '1.2.1', '2']
['1', '3', '2', '1.1']
['1.1.1.1', '1.1.2', '1.1.1']
['1.1', '2.1']
['1.1', '1.2', '2.1']

Demo: https://replit.com/@blhsing/StainedMassivePi

Comments

0

Since the parts of the string separated by / can presumably have different lengths, you can't just sort the strings directly. However, by splitting the strings over the /, you can get tuples, which you can sort directly in the way you want:

strings = ['a','a/b/a','a/b','a/b/d', 'b/a', 'b']
keys = sorted(map(lambda s: s.split('/'), strings))
print(keys)

Output:

[['a'], ['a', 'b'], ['a', 'b', 'a'], ['a', 'b', 'd'], ['b'], ['b', 'a']]

5 Comments

The objective is to get the representation such as ['1','1.1','1.1.1', '1.2','1.2.1','2'] as written in the question
So replace a with 1, b with 2 etc. care to provide more details on the conversion? Does case matter? What does an/x/d translate to
"The list is originally not necessarily sorted, and each level can be any generic word." So I can't before know what character or word will be what. Is it still confusing?
You didn't provide the input data and when asked about it, you said "would be a YAML file. But, the input into the function will simply be the list I mentioned" - this code results in that list and without a sample of the data you're looking to select from, it's unclear where that would be coming from? Or do you literally want to translate to a numerical equivalent (i.e. the first one is 1, the second 2, etc.)?
The issue is with something like ['a', 'a/b', 'b', 'b/a', 'b/c'] - in that case, do you expect ['1', '1.1', '2', '2.2', '2.3'] or ['1', '1.1', '2', '2.1', '2.2'], or ...?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.