6

Background:

I was working on a project were I needed to write some rules for text processing. After working on this project for a couple of days and implementing some rules, I realized I needed to determine the order of the rules. No problems, we have topological sorting to help. But then I realized that I can't expect the graph to be always full. So I came up with this idea, that given a single rule with a set of dependencies (or a single dependency) I need to check the dependencies of the dependencies. Sounds familiar? Yes. This subject is very similar to Depth-first-searching of a graph.
I am not a mathematician, nor did I study C.S. Hence, Graph Theory is a new field for me. Nevertheless, I implemented something (see below) which works (inefficiently, I suspect).

The code:

This is my search and yield algorithm. If you run it on the examples below, you will see it visits some nodes more then once. Hence, the speculated inefficiency.
A word about the input. The rules I wrote are basically python classes, which have a class property depends. I was criticized for not using inspect.getmro- But this would complicate thing terribly because the class would need to inherit from each other (See example here)

def _yield_name_dep(rules_deps):
    global recursion_counter
    recursion_counter = recursion_counter +1 
    # yield all rules by their named and dependencies
    for rule, dep in rules_deps.items():
        if not dep:
            yield rule, dep
            continue
        else:
            yield rule, dep
            for ii in dep:
                i = getattr(rules, ii)
                instance = i()
                if instance.depends:
                    new_dep={str(instance): instance.depends}
                    for dep in _yield_name_dep(new_dep):
                        yield dep    
                else:
                    yield str(instance), instance.depends

OK, now that you stared in the code, here is some input you can test:

demo_class_content ="""
class A(object):
    depends = ('B')
    def __str__(self):
        return self.__class__.__name__

class B(object):
    depends = ('C','F')
    def __str__(self):
        return self.__class__.__name__

class C(object):
    depends = ('D', 'E')
    def __str__(self):
        return self.__class__.__name__

class D(object):
    depends = None
    def __str__(self):
        return self.__class__.__name__   

class F(object):
    depends = ('E')
    def __str__(self):
        return self.__class__.__name__

class E(object):
    depends = None  
    def __str__(self):
        return self.__class__.__name__
"""       

with open('demo_classes.py', 'w') as clsdemo:
    clsdemo.write(demo_class_content)

import demo_classes as rules

rule_start={'A': ('B')}

def _yield_name_dep(rules_deps):
    # yield all rules by their named and dependencies
    for rule, dep in rules_deps.items():
        if not dep:
            yield rule, dep
            continue
        else:
            yield rule, dep
            for ii in dep:
                i = getattr(rules, ii)
                instance = i()
                if instance.depends:
                    new_dep={str(instance): instance.depends}
                    for dep in _yield_name_dep(new_dep):
                        yield dep    
                else:
                    yield str(instance), instance.depends

if __name__ == '__main__':
    # this is yielding nodes visited multiple times, 
    # list(_yield_name_dep(rule_start))
    # hence, my work around was to use set() ...
    rule_dependencies = list(set(_yield_name_dep(rule_start)))
    print rule_dependencies

The questions:

  • I tried classifying my work, and I think what I did is similar to DFS. Can you really classify it like this?
  • How can I improve this function to skip visited nodes, and still use generators ?

update:

Just to save you the trouble running the code, the output of the above function is:

>>> print list(_yield_name_dep(rule_wd))
[('A', 'B'), ('B', ('C', 'F')), ('C', ('D', 'E')), ('D', None), ('E', None), ('F', 'E'), ('E', None)]
>>> print list(set(_yield_name_dep(rule_wd)))
[('B', ('C', 'F')), ('E', None), ('D', None), ('F', 'E'), ('C', ('D', 'E')), ('A', 'B')]

In the mean while I came up with a better solution, the question above still remain. So feel free to criticize my solution:

visited = []
def _yield_name_dep_wvisited(rules_deps, visited):
    # yield all rules by their name and dependencies
    for rule, dep in rules_deps.items():
        if not dep and rule not in visited:
            yield rule, dep
            visited.append(rule)
            continue
        elif rule not in visited:
            yield rule, dep
            visited.append(rule)
            for ii in dep:
                i = getattr(grules, ii)
                instance = i()
                if instance.depends:
                    new_dep={str(instance): instance.depends}
                    for dep in _yield_name_dep_wvisited(new_dep, visited):
                        if dep not in visited:
                            yield dep    
                    
                elif str(instance) not in visited:
                    visited.append(str(instance))
                    yield str(instance), instance.depends

The output of the above is:

>>>list(_yield_name_dep_wvisited(rule_wd, visited))
[('A', 'B'), ('B', ('C', 'F')), ('C', ('D', 'E')), ('D', None), ('E', None), ('F', 'E')]

So as you can see now the node E is visited only once.

6
  • Why not store the class itself in depends rather than the name of a class? Commented Oct 14, 2013 at 10:40
  • @Eric, first because I would not know how to do it. Second, classes here are just for the demo. The code works on some properties defined in configuration file, and I wanted a self-containing example. Can you show what you mean? Commented Oct 14, 2013 at 10:43
  • depends = (E, F), without quotes. Note you'll need to define the things being depended on earlier in the file than the things which depend on them, but that's probably a good idea for readability anyway. Commented Oct 14, 2013 at 10:59
  • I believe this question is more suited to codereview.SE Anyway, I believe you could simplify the code if you always used tuples for depends. I mean, having depends = None, depends = 'A' and depends = ('A', 'B') you have to treat explicitly each of these cases. You could simply use depends = (), depends = ('A',) and depends = ('A', 'B') and have more uniform code. An other thing: instance = i(); if instance.depends: Since depends is a class attribute you don't have to instantiate the class. Simply do if i.depends:. Commented Oct 14, 2013 at 11:34
  • If you always used tuples you could remove completely the if not dep ..., since the for would never be executed without dependencies. Right know your code works only if dependencies have single letter names, in other cases you'd have problems with depends = 'MultiLetterName' vs depends = ('A', 'B') and you'd have to check whether dep is a string. Commented Oct 14, 2013 at 11:37

2 Answers 2

3

Using the feedback from Gareth and other kind users of Stackoverflow, here is what I came up with. It is clearer, and also more general:

def _dfs(start_nodes, rules, visited):
    """
    Depth First Search
    start_nodes - Dictionary of Rule with dependencies (as Tuples):    

        start_nodes = {'A': ('B','C')}

    rules - Dictionary of Rules with dependencies (as Tuples):
    e.g.
    rules = {'A':('B','C'), 'B':('D','E'), 'C':('E','F'), 
             'D':(), 'E':(), 'F':()}
    The above rules describe the following DAG:

                    A
                   / \
                  B   C
                 / \ / \
                D   E   F
    usage:
    >>> rules = {'A':('B','C'), 'B':('D','E'), 'C':('E','F'), 
                 'D':(), 'E':(), 'F':()}
    >>> visited = []
    >>> list(_dfs({'A': ('B','C')}, rules, visited))
    [('A', ('B', 'C')), ('B', ('D', 'E')), ('D', ()), ('E', ()), 
    ('C', ('E', 'F')), ('F', ())]
    """

    for rule, dep in start_nodes.items():
        if rule not in visited:
            yield rule, dep
            visited.append(rule)
            for ii in dep:
                new_dep={ ii : rules[ii]}
                for dep in _dfs(new_dep, rules, visited):
                    if dep not in visited:
                        yield dep
Sign up to request clarification or add additional context in comments.

Comments

1

Here is another way to do do a breadth first search without duplicating the visited nodes.

import pylab
import networkx as nx

G = nx.DiGraph()
G.add_nodes_from([x for x in 'ABCDEF'])
G.nodes()

returns ['A', 'C', 'B', 'E', 'D', 'F']

G.add_edge('A','B')
G.add_edge('A','C')
G.add_edge('B','D')
G.add_edge('B','E')
G.add_edge('C','E')
G.add_edge('C','F')

and here is how you can traverse the tree without duplicating nodes.

nx.traversal.dfs_successors(G)

returns {'A': ['C', 'B'], 'B': ['D'], 'C': ['E', 'F']} and you can draw the graph.

nx.draw(G,node_size=1000)

2 Comments

actually, that was my prefered solution. My Colleague was really against adding 'another dependency' to out project. I think it is always better to avoid re-inventing the wheel. Although, when you re-invent the wheel you learn alot.
You can spend your time learning a lot about what others have done or continuing with your research which may help everyone learn something new. There are a lot of algorithms out there. Be careful with your time. I wish you well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.