Traverse elements of nested sections list python

Question

I have a list of dictionarys named "sections" in this format:

  [{
      "elements": [
        "/sections/1",
        "/sections/5",
        "/sections/6",
        "/sections/7"
      ]
    },
    {
      "elements": [
        "/sections/2",
        "/sections/3",
        "/sections/4"
      ]
    },
    {
      "elements": [
        "/paragraphs/0"
      ]
    },
    {
      "elements": [
        "/paragraphs/1"
      ]
    },
    {
      "elements": [
        "/paragraphs/2"
      ]
    },
    {
      "elements": [
        "/paragraphs/3",
        "/tables/0",
        "/paragraphs/5",
        "/paragraphs/6",
        "/paragraphs/7",
        "/paragraphs/8",
        "/paragraphs/9",
        "/paragraphs/10",
        "/paragraphs/11",
        "/paragraphs/12",
        "/paragraphs/13",
        "/paragraphs/14",
        "/paragraphs/15",
        "/paragraphs/16",
        "/paragraphs/17",
        "/paragraphs/18"
      ]
    },
    {
      "elements": [
        "/paragraphs/19",
        "/paragraphs/21",
        "/paragraphs/22",
        "/paragraphs/23",
        "/paragraphs/24",
        "/paragraphs/25",
        "/paragraphs/26",
        "/paragraphs/27",
        "/paragraphs/28",
        "/paragraphs/29",
        "/paragraphs/30",
        "/paragraphs/31",
        "/paragraphs/32",
        "/paragraphs/33",
        "/paragraphs/34",
        "/paragraphs/35",
        "/paragraphs/36",
        "/paragraphs/37",
        "/paragraphs/38",
        "/paragraphs/39",
        "/paragraphs/40",
        "/paragraphs/41",
        "/paragraphs/42"
      ]
    }]

It is a sample output of Azure Document Intelligence json. I want to traverse through the sections. "sections" is a list of values which may contain nested sections or paragraphs as well.

for example print(sections[0]) would give me {'elements': ['/sections/1', '/sections/5', '/sections/6', '/sections/7']}

The "/sections/1" can be interpreted as sections[1] and similarly for others.

the hierarcy of the nesting is Section--->Paragraph

I want to traverse the list and flatten the output.

I have another dictionary for paragraphs which has key as paragraph number and value as actual paragraph content, which I want to reference.

Therefore I am expecting to traverse this sections list and get an output of paragraphs such as: ["/paragraphs/0","/paragraphs/1","/paragraphs/3","/tables/0","/paragraphs/5"...]

Once I have output in this format I can write another function to extract exact information from the paragraph dictionary.(I'll do it myself.)

I need help in writing a code/function for the traversal in a optimized way. I had written something, but it's not giving right result.

def CheckSectCondition(sect_elems):
    if len([s for s in sect_elems if "sect" in s]) == 0:
        return True
    else:
        return False
    
all_text = ""
for i in range(0,len(section_data)):
    curr_section = section_data[i]
    curr_section_elements = curr_section['elements']
    if CheckSectCondition(curr_section_elements) == False:
        while CheckSectCondition(curr_section_elements) == False:
            for i in curr_section_elements:
                if i[1:5] == 'sect':
                    sub_sec_name = i.split('/')[-1]
                    sub_sec_elements = section_data[int(sub_sec_name)]['elements']
                    print(sub_sec_elements)
                    #again iterate
                elif i[1:5] == 'para':
                    print(i)
                    #do something
                elif i[1:5] == 'tabl':
                    print(i)
                    #do something
            CheckSectCondition(curr_section_elements) == True

### here section_data is the sections list

Any help would be much appretiated as I don't know recursive programming because the section inside a section could be multiple levels.

So basically the list that you showed you want to get it flatten out right? just all the elements in each dictionary converted to one big flat sequential list right? — Spidy
– Spidy, Commented Jun 7, 2024 at 7:54
@Spidy : Yes, the output should be: ['/paragraphs/0', '/paragraphs/1', '/paragraphs/2', '/paragraphs/3', '/paragraphs/5', '/tables/0', '/paragraphs/6',....] and so on — Shubham R
– Shubham R, Commented Jun 7, 2024 at 8:23
Hi. Could you explain the logic of the output you expect in more detail? I don't understand. — Stef
– Stef, Commented Jun 10, 2024 at 9:03
@Stef Every list element is a section, for eg. sections is a list and sections[1] = "/sections/1". Each section may have sub section or the end node as paragraph — Shubham R
– Shubham R, Commented Jun 10, 2024 at 13:30
@ShubhamR Sorry, I don't understand. Can you perhaps include a smaller dictionary example (the one currently in your question has a lot of sections and paragraphs; I'm sure you don't need that many to explain the issue), and show us the exact output you expect on that smaller example, and explain why you expect that output? Using the edit button to do that. — Stef
– Stef, Commented Jun 10, 2024 at 16:22

Spidy · Accepted Answer · 2024-06-07 09:23:08Z

2

Here is a basic way to do this

t = [{
      "elements": [
        "/sections/1",
        "/sections/5",
        "/sections/6",
        "/sections/7"
      ]
    },
    {
      "elements": [
        "/sections/2",
        "/sections/3",
        "/sections/4"
      ]
    },
    {
      "elements": [
        "/paragraphs/0"
      ]
    },
    {
      "elements": [
        "/paragraphs/1"
      ]
    },
    {
      "elements": [
        "/paragraphs/2"
      ]
    },
    {
      "elements": [
        "/paragraphs/3",
        "/tables/0",
        "/paragraphs/5",
        "/paragraphs/6",
        "/paragraphs/7",
        "/paragraphs/8",
        "/paragraphs/9",
        "/paragraphs/10",
        "/paragraphs/11",
        "/paragraphs/12",
        "/paragraphs/13",
        "/paragraphs/14",
        "/paragraphs/15",
        "/paragraphs/16",
        "/paragraphs/17",
        "/paragraphs/18"
      ]
    },
    {
      "elements": [
        "/paragraphs/19",
        "/paragraphs/21",
        "/paragraphs/22",
        "/paragraphs/23",
        "/paragraphs/24",
        "/paragraphs/25",
        "/paragraphs/26",
        "/paragraphs/27",
        "/paragraphs/28",
        "/paragraphs/29",
        "/paragraphs/30",
        "/paragraphs/31",
        "/paragraphs/32",
        "/paragraphs/33",
        "/paragraphs/34",
        "/paragraphs/35",
        "/paragraphs/36",
        "/paragraphs/37",
        "/paragraphs/38",
        "/paragraphs/39",
        "/paragraphs/40",
        "/paragraphs/41",
        "/paragraphs/42"
      ]
    }]
    
r = []
for k in t:
    r.extend(k['elements'])
print(r)

tell me if its optimized enough for your use case and then we can optimize it further if needed.

answered Jun 7, 2024 at 9:23

Spidy

648 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Shubham R Over a year ago

No that's not the intended output

Spidy Over a year ago

So am guessing the intended output should have only "paragraph" element in it?

Collectives™ on Stack Overflow

Traverse elements of nested sections list python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related