I have a list of dictionarys named "sections" in this format:
[{
"elements": [
"/sections/1",
"/sections/5",
"/sections/6",
"/sections/7"
]
},
{
"elements": [
"/sections/2",
"/sections/3",
"/sections/4"
]
},
{
"elements": [
"/paragraphs/0"
]
},
{
"elements": [
"/paragraphs/1"
]
},
{
"elements": [
"/paragraphs/2"
]
},
{
"elements": [
"/paragraphs/3",
"/tables/0",
"/paragraphs/5",
"/paragraphs/6",
"/paragraphs/7",
"/paragraphs/8",
"/paragraphs/9",
"/paragraphs/10",
"/paragraphs/11",
"/paragraphs/12",
"/paragraphs/13",
"/paragraphs/14",
"/paragraphs/15",
"/paragraphs/16",
"/paragraphs/17",
"/paragraphs/18"
]
},
{
"elements": [
"/paragraphs/19",
"/paragraphs/21",
"/paragraphs/22",
"/paragraphs/23",
"/paragraphs/24",
"/paragraphs/25",
"/paragraphs/26",
"/paragraphs/27",
"/paragraphs/28",
"/paragraphs/29",
"/paragraphs/30",
"/paragraphs/31",
"/paragraphs/32",
"/paragraphs/33",
"/paragraphs/34",
"/paragraphs/35",
"/paragraphs/36",
"/paragraphs/37",
"/paragraphs/38",
"/paragraphs/39",
"/paragraphs/40",
"/paragraphs/41",
"/paragraphs/42"
]
}]
It is a sample output of Azure Document Intelligence json. I want to traverse through the sections. "sections" is a list of values which may contain nested sections or paragraphs as well.
for example print(sections[0]) would give me {'elements': ['/sections/1', '/sections/5', '/sections/6', '/sections/7']}
The "/sections/1" can be interpreted as sections[1] and similarly for others.
the hierarcy of the nesting is Section--->Paragraph
I want to traverse the list and flatten the output.
I have another dictionary for paragraphs which has key as paragraph number and value as actual paragraph content, which I want to reference.
Therefore I am expecting to traverse this sections list and get an output of paragraphs such as:
["/paragraphs/0","/paragraphs/1","/paragraphs/3","/tables/0","/paragraphs/5"...]
Once I have output in this format I can write another function to extract exact information from the paragraph dictionary.(I'll do it myself.)
I need help in writing a code/function for the traversal in a optimized way. I had written something, but it's not giving right result.
def CheckSectCondition(sect_elems):
if len([s for s in sect_elems if "sect" in s]) == 0:
return True
else:
return False
all_text = ""
for i in range(0,len(section_data)):
curr_section = section_data[i]
curr_section_elements = curr_section['elements']
if CheckSectCondition(curr_section_elements) == False:
while CheckSectCondition(curr_section_elements) == False:
for i in curr_section_elements:
if i[1:5] == 'sect':
sub_sec_name = i.split('/')[-1]
sub_sec_elements = section_data[int(sub_sec_name)]['elements']
print(sub_sec_elements)
#again iterate
elif i[1:5] == 'para':
print(i)
#do something
elif i[1:5] == 'tabl':
print(i)
#do something
CheckSectCondition(curr_section_elements) == True
### here section_data is the sections list
Any help would be much appretiated as I don't know recursive programming because the section inside a section could be multiple levels.