0

I have a list of dictionarys named "sections" in this format:

  [{
      "elements": [
        "/sections/1",
        "/sections/5",
        "/sections/6",
        "/sections/7"
      ]
    },
    {
      "elements": [
        "/sections/2",
        "/sections/3",
        "/sections/4"
      ]
    },
    {
      "elements": [
        "/paragraphs/0"
      ]
    },
    {
      "elements": [
        "/paragraphs/1"
      ]
    },
    {
      "elements": [
        "/paragraphs/2"
      ]
    },
    {
      "elements": [
        "/paragraphs/3",
        "/tables/0",
        "/paragraphs/5",
        "/paragraphs/6",
        "/paragraphs/7",
        "/paragraphs/8",
        "/paragraphs/9",
        "/paragraphs/10",
        "/paragraphs/11",
        "/paragraphs/12",
        "/paragraphs/13",
        "/paragraphs/14",
        "/paragraphs/15",
        "/paragraphs/16",
        "/paragraphs/17",
        "/paragraphs/18"
      ]
    },
    {
      "elements": [
        "/paragraphs/19",
        "/paragraphs/21",
        "/paragraphs/22",
        "/paragraphs/23",
        "/paragraphs/24",
        "/paragraphs/25",
        "/paragraphs/26",
        "/paragraphs/27",
        "/paragraphs/28",
        "/paragraphs/29",
        "/paragraphs/30",
        "/paragraphs/31",
        "/paragraphs/32",
        "/paragraphs/33",
        "/paragraphs/34",
        "/paragraphs/35",
        "/paragraphs/36",
        "/paragraphs/37",
        "/paragraphs/38",
        "/paragraphs/39",
        "/paragraphs/40",
        "/paragraphs/41",
        "/paragraphs/42"
      ]
    }]

It is a sample output of Azure Document Intelligence json. I want to traverse through the sections. "sections" is a list of values which may contain nested sections or paragraphs as well.

for example print(sections[0]) would give me {'elements': ['/sections/1', '/sections/5', '/sections/6', '/sections/7']}

The "/sections/1" can be interpreted as sections[1] and similarly for others.

the hierarcy of the nesting is Section--->Paragraph

I want to traverse the list and flatten the output.

I have another dictionary for paragraphs which has key as paragraph number and value as actual paragraph content, which I want to reference.

Therefore I am expecting to traverse this sections list and get an output of paragraphs such as: ["/paragraphs/0","/paragraphs/1","/paragraphs/3","/tables/0","/paragraphs/5"...]

Once I have output in this format I can write another function to extract exact information from the paragraph dictionary.(I'll do it myself.)

I need help in writing a code/function for the traversal in a optimized way. I had written something, but it's not giving right result.

def CheckSectCondition(sect_elems):
    if len([s for s in sect_elems if "sect" in s]) == 0:
        return True
    else:
        return False
    
all_text = ""
for i in range(0,len(section_data)):
    curr_section = section_data[i]
    curr_section_elements = curr_section['elements']
    if CheckSectCondition(curr_section_elements) == False:
        while CheckSectCondition(curr_section_elements) == False:
            for i in curr_section_elements:
                if i[1:5] == 'sect':
                    sub_sec_name = i.split('/')[-1]
                    sub_sec_elements = section_data[int(sub_sec_name)]['elements']
                    print(sub_sec_elements)
                    #again iterate
                elif i[1:5] == 'para':
                    print(i)
                    #do something
                elif i[1:5] == 'tabl':
                    print(i)
                    #do something
            CheckSectCondition(curr_section_elements) == True

### here section_data is the sections list

Any help would be much appretiated as I don't know recursive programming because the section inside a section could be multiple levels.

5
  • So basically the list that you showed you want to get it flatten out right? just all the elements in each dictionary converted to one big flat sequential list right? Commented Jun 7, 2024 at 7:54
  • @Spidy : Yes, the output should be: ['/paragraphs/0', '/paragraphs/1', '/paragraphs/2', '/paragraphs/3', '/paragraphs/5', '/tables/0', '/paragraphs/6',....] and so on Commented Jun 7, 2024 at 8:23
  • Hi. Could you explain the logic of the output you expect in more detail? I don't understand. Commented Jun 10, 2024 at 9:03
  • @Stef Every list element is a section, for eg. sections is a list and sections[1] = "/sections/1". Each section may have sub section or the end node as paragraph Commented Jun 10, 2024 at 13:30
  • @ShubhamR Sorry, I don't understand. Can you perhaps include a smaller dictionary example (the one currently in your question has a lot of sections and paragraphs; I'm sure you don't need that many to explain the issue), and show us the exact output you expect on that smaller example, and explain why you expect that output? Using the edit button to do that. Commented Jun 10, 2024 at 16:22

1 Answer 1

2

Here is a basic way to do this

t = [{
      "elements": [
        "/sections/1",
        "/sections/5",
        "/sections/6",
        "/sections/7"
      ]
    },
    {
      "elements": [
        "/sections/2",
        "/sections/3",
        "/sections/4"
      ]
    },
    {
      "elements": [
        "/paragraphs/0"
      ]
    },
    {
      "elements": [
        "/paragraphs/1"
      ]
    },
    {
      "elements": [
        "/paragraphs/2"
      ]
    },
    {
      "elements": [
        "/paragraphs/3",
        "/tables/0",
        "/paragraphs/5",
        "/paragraphs/6",
        "/paragraphs/7",
        "/paragraphs/8",
        "/paragraphs/9",
        "/paragraphs/10",
        "/paragraphs/11",
        "/paragraphs/12",
        "/paragraphs/13",
        "/paragraphs/14",
        "/paragraphs/15",
        "/paragraphs/16",
        "/paragraphs/17",
        "/paragraphs/18"
      ]
    },
    {
      "elements": [
        "/paragraphs/19",
        "/paragraphs/21",
        "/paragraphs/22",
        "/paragraphs/23",
        "/paragraphs/24",
        "/paragraphs/25",
        "/paragraphs/26",
        "/paragraphs/27",
        "/paragraphs/28",
        "/paragraphs/29",
        "/paragraphs/30",
        "/paragraphs/31",
        "/paragraphs/32",
        "/paragraphs/33",
        "/paragraphs/34",
        "/paragraphs/35",
        "/paragraphs/36",
        "/paragraphs/37",
        "/paragraphs/38",
        "/paragraphs/39",
        "/paragraphs/40",
        "/paragraphs/41",
        "/paragraphs/42"
      ]
    }]
    
r = []
for k in t:
    r.extend(k['elements'])
print(r)

tell me if its optimized enough for your use case and then we can optimize it further if needed.

Sign up to request clarification or add additional context in comments.

2 Comments

No that's not the intended output
So am guessing the intended output should have only "paragraph" element in it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.