2

My python program returns a list containing data of sub-list. Each sub-list contains the unique id of an article and the parent id of that article viz

pages_id_list ={ {22, 4},{45,1},{1,1}, {4,4},{566,45},{7,7},{783,566}, {66,1},{300,8},{8,4},{101,7},{80,22}, {17,17},{911,66} }

In each sub-list, the data is structured this way {*article_id*, *parent_id*} If the article_id and parent_id are the same it obviously mean that article has no parent.

I would like to sort the data using minimal code such that for each article, I can readily access a list of it's children and grandchildren (nested data) if available. For example (using the example data above) I should be able to print at the end of the day:

 1
 -45
 --566
 ---783
 -66
 --911

.... for article id 1

I could only sort out the highest level (Ist and 2nd generation) ids. Having problem getting the 3rd and subsequent generations.

This is the code I used:

highest_level = set()
first_level = set()
sub_level = set()

for i in pages_id_list:
    id,pid = i['id'],i['pid']

    if id == pid:
        #Pages of the highest hierarchy
        highest_level.add(id)

for i in pages_id_list:
    id,pid = i['id'],i['pid']

    if id != pid :
        if pid in highest_level:
            #First child pages
            first_level.add(id)
        else:
            sub_level.add(id)

My code sadly does not work.

Any help/nudge in the right direction will be appreciated. Thanks

David

1
  • 1
    Have you tried using a while loop? Commented Jan 7, 2013 at 2:48

3 Answers 3

5

Maybe something like this:

#! /usr/bin/python3.2

pages_id_list = [ (22, 4),(45,1),(1,1), (4,4),(566,45),(7,7),(783,566), (66,1),(300,8),(8,4),(101,7),(80,22), (17,17),(911,66) ]

class Node:
    def __init__ (self, article):
        self.article = article
        self.children = []
        self.parent = None

    def print (self, level = 0):
        print ('{}{}'.format ('\t' * level, self.article) )
        for child in self.children: child.print (level + 1)

class Tree:
    def __init__ (self): self.nodes = {}

    def push (self, item):
        article, parent = item
        if parent not in self.nodes: self.nodes [parent] = Node (parent)
        if article not in self.nodes: self.nodes [article] = Node (article)
        if parent == article: return
        self.nodes [article].parent = self.nodes [parent]
        self.nodes [parent].children.append (self.nodes [article] )

    @property
    def roots (self): return (x for x in self.nodes.values () if not x.parent)

t = Tree ()
for i in pages_id_list: t.push (i)
for node in t.roots: node.print ()

This creates a tree structure which you can traverse in order to get all subitems. You can access any article via t.nodes [article] and get its children via t.nodes [article].children.

The output of the print method is:

1
    45
        566
            783
    66
        911
4
    22
        80
    8
        300
7
    101
17
Sign up to request clarification or add additional context in comments.

4 Comments

+1 for node/tree objects - IMHO much more understandable. You might even be able to override the __str__ function instead of using node.print()?
Also bear in mind that Python will throw an error if you recurse more than 1000 levels deep. (This part: child.print (level + 1))
If it is to be expected that the tree will have more then 1000 levels, you should derecursify the print function.
1

Here's a simple approach (assuming your page id list elements are not sets, as your code suggests):

from collections import defaultdict

page_ids = [
    (22, 4), (45, 1), (1, 1), (4, 4),
    (566, 45), (7, 7), (783, 566), (66, 1), (300, 8),
    (8, 4), (101, 7), (80, 22), (17, 17), (911, 66)
]

def display(id, nodes, level):
    print('%s%s%s' % ('  ' * level, '\\__', id))
    for child in sorted(nodes.get(id, [])):
        display(child, nodes, level + 1)

if __name__ == '__main__':
    nodes, roots = defaultdict(set), set()

    for article, parent in page_ids:
        if article == parent:
            roots.add(article)
        else:
            nodes[parent].add(article)

    # nodes now looks something like this:
    # {1: [45, 66], 66: [911], 4: [22, 8], 22: [80], 
    #  7: [101], 8: [300], 45: [566], 566: [783]}

    for id in sorted(roots):
        display(id, nodes, 0)

Output would be:

\__1
  \__45
    \__566
      \__783
  \__66
    \__911
\__4
  \__8
    \__300
  \__22
    \__80
\__7
  \__101
\__17

Source: https://gist.github.com/4472070

2 Comments

@Hyperboreus, I explain it in my answer.
Miku this is useful to know though I prefer a solution without need of module. Never had done much with the collections module before now. Thanks
1

I would like to sort the data using minimal code

I read this until now and therefore I will provide another answer. I will not edit my previous answer, because they are really not related. If you want to transfer your list of tuples into a tree structure with minimal code, then this approach is quite minimal, although it can still be minimized further (e.g. using a recursive lambda term instead of the function):

pages_id_list = [ (22, 4),(45,1),(1,1), (4,4),(566,45),(7,7),(783,566), (66,1),(300,8),(8,4),(101,7),(80,22), (17,17),(911,66) ]

def getTree (item, pages): return [ (x, getTree (x, pages) ) if getTree (x, pages) else x for x in (x [0] for x in pages if x [1] == item) ]

tree = getTree (None, [ (x [0], None if x [0] == x [1] else x [1] ) for x in pages_id_list] )

2 Comments

Thanks for this Hyperboreus. This is most useful to me because of the minimal code.
Bear in mind though that the computational cost its horrendous.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.