I am trying to obtain a hierarchical structure of sections, sub-sections, sub-sub-sections in a Wikipedia page.
I have a string like this:
mystr = 'a = b = = c = == d == == e == === f === === g === ==== h ==== === i === == j == == k == = l ='
In this case the page name is 'a' and the structure is following
= b =
= c =
== d ==
== e ==
=== f ===
=== g ===
==== h ====
=== i ===
== j ==
== k ==
= l =
The equality signs are indicators of section or sub-section and so on. I need to obtain a python list containing all the relational hierarchical structures like this:
mylist = ['a', 'a/b', 'a/c', 'a/c/d', 'a/c/e', 'a/c/e/f', 'a/c/e/g',
'a/c/e/g/h', 'a/c/e/i', 'a/c/j', 'a/c/k', 'a/l']
So far I have been able to find the sections, sub-sections and so on by doing this:
sections = re.findall(r' = (.*?)\ =', mystr)
subsections = re.findall(r' == (.*?)\ ==', mystr)
...
But I don't know how to proceed from here to get the desired mylist.